66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

26. - 30.09.2021, online

Comparison of Two Text-based Search Algorithms in an Online Literature Database for Integrative Medicine – First Results

  • Sebastian Unger - Universität Witten/Herdecke, Witten, Germany
  • Thomas Ostermann - Universität Witten/Herdecke, Witten, Germany
  • Christa Raak - Universität Witten/Herdecke, Herdecke, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 26.-30.09.2021. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 65

doi: 10.3205/21gmds045, urn:nbn:de:0183-21gmds0459

Veröffentlicht: 24. September 2021

© 2021 Unger et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe



Introduction: Although there is a steady increase of scientific publications in the field of integrative medicine, it is still difficult to get a valid overview of published evidence. Specialist libraries and bibliographical databases may therefore contribute as sources for an evidence base. The open accessible bibliographical database CAMbase (available at hosted by Witten/Herdecke University is one of such established databases in this field. To keep it alive and secure it for various network attacks, e.g., exploits based on discovered vulnerabilities, its underlying operating system (OS) need to be upgraded or even replaced regularly.

State of the Art: In 2020, CAMbase, which was installed on a 32-bit platform till then, was subject of a migration to a newer 64-bit OS, resulting in a variety of errors during search queries or when the search results were displayed. This is in accordance to published experiences but leads to a decrease in usability. As files were stored in binary format, a rework by the program code was no option.

Concept: The main architecture of CAMbase can be divided into basically three layers: First, there is the presentation layer, representing an XML-based (Extensible Markup Language) GUI (Graphical User Interface) on the client side to relieve the server. Second, there is the business layer to interpret search queries semantically and syntactically, which went far beyond simple stemming methods at the time of development. Finally, the last layer consists of the database and its structure, containing bibliographical data of integrative medicine. A promising solution of keeping and still accessing the data of CAMbase was to replace the business logic with the open-source platform Solr, which uses a score ranking algorithm for its search queries.

Implementation: In order to approximately equalize the former semantic-syntactic algorithm, a search query is now interpreted and simplified by a light stemming method. This implementation covers different spellings and a wide range of search results. A paired T-test showed significant differences in search hits and times between the search algorithms, i.e., while the search results increased with Solr’s algorithm, the search speed also increased, making this algorithm more efficient.

Lessons learned: Next to the GUI, which is already comfortable for users, the systems’ unique database, and the modular construction principle could be retained by this modification process. The advantage is that users do not need any acclimation. The disadvantage is that Solr does not support any semantic analysis so that a plugin has to be written for this purpose.

Conclusions: The adaptation of Solr into CAMbase changed the results of searches. On the one hand, a broader search takes the user longer to go through the results. On the other hand, users might be able to find more related results, which might potentially expand the evidence base in integrative medicine. As CAMbase still includes a big amount of grey literature, it therefore has to be considered as a main evidence source in systematic reviews or meta analyses.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


