gms | German Medical Science

66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

26. - 30.09.2021, online

Comparison of Two Text-based Search Algorithms in an Online Literature Database for Integrative Medicine – First Results

Meeting Abstract

Search Medline for

  • Sebastian Unger - Universität Witten/Herdecke, Witten, Germany
  • Thomas Ostermann - Universität Witten/Herdecke, Witten, Germany
  • Christa Raak - Universität Witten/Herdecke, Herdecke, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 26.-30.09.2021. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 65

doi: 10.3205/21gmds045, urn:nbn:de:0183-21gmds0459

Published: September 24, 2021

© 2021 Unger et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: Although there is a steady increase of scientific publications in the field of integrative medicine, it is still difficult to get a valid overview of published evidence. Specialist libraries and bibliographical databases may therefore contribute as sources for an evidence base. The open accessible bibliographical database CAMbase (available at https://cambase.de) hosted by Witten/Herdecke University is one of such established databases in this field. To keep it alive and secure it for various network attacks, e.g., exploits based on discovered vulnerabilities, its underlying operating system (OS) need to be upgraded or even replaced regularly.

State of the Art: In 2020, CAMbase, which was installed on a 32-bit platform till then, was subject of a migration to a newer 64-bit OS, resulting in a variety of errors during search queries or when the search results were displayed. This is in accordance to published experiences but leads to a decrease in usability. As files were stored in binary format, a rework by the program code was no option.

Concept: The main architecture of CAMbase can be divided into basically three layers: First, there is the presentation layer, representing an XML-based (Extensible Markup Language) GUI (Graphical User Interface) on the client side to relieve the server. Second, there is the business layer to interpret search queries semantically and syntactically, which went far beyond simple stemming methods at the time of development. Finally, the last layer consists of the database and its structure, containing bibliographical data of integrative medicine. A promising solution of keeping and still accessing the data of CAMbase was to replace the business logic with the open-source platform Solr, which uses a score ranking algorithm for its search queries.

Implementation: In order to approximately equalize the former semantic-syntactic algorithm, a search query is now interpreted and simplified by a light stemming method. This implementation covers different spellings and a wide range of search results. A paired T-test showed significant differences in search hits and times between the search algorithms, i.e., while the search results increased with Solr’s algorithm, the search speed also increased, making this algorithm more efficient.

Lessons learned: Next to the GUI, which is already comfortable for users, the systems’ unique database, and the modular construction principle could be retained by this modification process. The advantage is that users do not need any acclimation. The disadvantage is that Solr does not support any semantic analysis so that a plugin has to be written for this purpose.

Conclusions: The adaptation of Solr into CAMbase changed the results of searches. On the one hand, a broader search takes the user longer to go through the results. On the other hand, users might be able to find more related results, which might potentially expand the evidence base in integrative medicine. As CAMbase still includes a big amount of grey literature, it therefore has to be considered as a main evidence source in systematic reviews or meta analyses.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Alhazmi OH, Malaiya YK. Application of Vulnerability Discovery Models to Major Operating Systems. IEEE Trans Rel. 2008;57(1):14–22. DOI: 10.1109/TR.2008.916872 External link
2.
Haake E, Blenkle M, Ellis R, Zillmann H. Nur die ersten Drei zählen! Optimierung der Rankingverfahren über Popularitätsfaktoren bei der Elektronischen Bibliothek Bremen (E-LIB). o-bib. 2015;2(2):33–42. DOI: 10.5282/o-bib/2015H2S33-42 External link
3.
Kumar J. Apache Solr search patterns. Birmingham: Packt Publishing; 2015.
4.
Luburić N, Ivanović D. Comparing Apache Solr and Elasticsearch search servers. In: 6th International Conference on Information Society and Technology (ICIST); 2016. p. 287–291.
5.
Ostermann T, Raak CK, Matthiessen PF, Büssing A, Zillmann H. Linguistic processing and classification of semi structured bibliographic data on complementary medicine. Cancer Inform. 2009;7:159–69. DOI: 10.4137/cin.s1182 External link
6.
Ostermann T, Zillmann H, Raak CK, Buessing A, Matthiessen PF. CAMbase–A XML-based bibliographical database on Complementary and Alternative Medicine (CAM). Biomedical Digital Libraries. 2007;4(1):1-8.
7.
Wressnegger C, Yamaguchi F, Maier A, Rieck K. 64-bit migration vulnerabilities. it - Information Technology. 2017;59(2):73-81.
8.
Zillmann H. Information Retrieval and Search Engines in Full-text Databases. LIBER Quarterly. 2000;10(3):335–41. Available from: https://dspace.library.uu.nl/handle/1874/241217 External link