gms | German Medical Science

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH)

08.09. - 13.09.2024, Dresden

Similarity scoring as a novel approach towards reusing and combining data from existing clinical and health studies

Meeting Abstract

  • Lea Gütebier - Medical Informatics Laboratory, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
  • Angela Dedié - German Center for Diabetes Research (DZD), Head Office at Helmholtz Munich, Neuherberg, Germany
  • Ron Henkel - Medical Informatics Laboratory, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
  • Till Ittermann - Department of SHIP/Clinical-Epidemiological Research, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany; German Centre for Cardiovascular Research (DZHK), partner site Greifswald, Greifswald, Germany
  • Volkmar Liebscher - Institute of Mathematics and Computer Science, University of Greifswald, Greifswald, Germany
  • Lea Michaelis - Medical Informatics Laboratory, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
  • Alexander Teumer - Department of Psychiatry and Psychotherapy, University Medicine Greifswald, Greifswald, Germany; German Centre for Cardiovascular Research (DZHK), partner site Greifswald, Greifswald, Germany
  • Marcus Vollmer - Institute of Bioinformatics, University Medicine Greifswald, Greifswald, Germany; German Centre for Cardiovascular Research (DZHK), partner site Greifswald, Greifswald, Germany
  • Dagmar Waltemath - Medical Informatics Laboratory, University Medicine Greifswald, Greifswald, Germany
  • Marie-Louise Witte - Department of Medical Informatics, University Medical Center Göttingen, Göttingen, Germany; German Centre for Cardiovascular Research (DZHK), partner site Göttingen, Göttingen, Germany
  • Stefan Groß - Department of Internal Medicine B/Cardiology, University Medicine Greifswald, Greifswald, Germany; German Centre for Cardiovascular Research (DZHK), partner site Greifswald, Greifswald, Germany

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH). Dresden, 08.-13.09.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 564

doi: 10.3205/24gmds015, urn:nbn:de:0183-24gmds0154

Published: September 6, 2024

© 2024 Gütebier et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: There is an urgent need in clinical and epidemiological research to identify suitable studies for pooled data analyses and meta-analyses to achieve generalisable scientific results. However, finding studies of interest (“retrieval”) and determining the best matches among these studies for the search criteria (“ranking”) remains challenging. This work aims to establish a solid foundation for methods that (1) find clinical and epidemiological studies of interest based on a feature set and (2) identify studies that are similar to a given example study.

Methods: In April 2024 we ran a multi-disciplinary workshop with participants from the University Medicine Greifswald, the German Centre for Diabetes Research (DZD), and the German Centre for Cardiovascular Research (DZHK). Two groups of experts from epidemiology, medical informatics, biostatistics, and research data management examined publicly available information of three example studies, one group checking the DZHK research platform (https://dzhk.de/en/research/clinical-research/dzhk-studies/) [1], the other group reviewing structured entries provided at clinicaltrials.gov and EudraCT. Based on these information, each group derived features that were then discussed and harmonised in a single list of meaningful features. This feature set can be incorporated into a study similarity score. This similarity score will be incorporated in our search index [2] and, together with our study graph, provide novel functionalities for structured and efficient navigation and retrieval of clinical and health studies.

Results: We identified a concise list of 17 features for a weighted similarity score for clinical and health studies. These features include, for example, study design, eligibility criteria, participant selection criteria, and recorded parameters (such as anthropometric measures or biomarkers) following the definition in the study data dictionaries. Additionally, the list includes features covering descriptive (e.g., study title) and administrative meta-data (e.g., publication repository), which enables us to link study information to external resources.

We manually tested the applicability of our feature list in accordance with our three example studies. The identified features are concurrently being integrated in our existing graph model for study data. We expect that the insights gained from that extension will enhance the overall design and coverage of the data structures [3].

Conclusion: The presented feature list can be applied to different types of studies ranging from randomized clinical trials to complex cohort study designs. Although our feature list was primarily developed using data from clinicaltrials.gov and EudraCT, it can be adapted to other platforms, such as the Portal of Medical Data Models [4].

In the next step, we will refine the graph model [3] and implement the accompanying search methods and similarity scores in accordance to the feature list. Furthermore, we will back up these methods with supporting ontologies and identifying cross-references to meta-data items, for example from the NFDI4Health Metadata Schema [5] and the DZD Core Data Set [6].

This work marks an initial step towards developing a graph-based similarity search tool for clinical and health studies. This innovative approach shall facilitate multi-study analysis projects, improve the efficiency of study retrieval, and promote greater FAIRness (Findability, Accessibility, Interoperability, and Reusability) in health research [7].

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Hoffmann J, Hanß S, Kraus M, Schaller J, Schäfer C, Stahl D, et al. The DZHK research platform: maximisation of scientific value by enabling access to health data and biological samples collected in cardiovascular clinical studies. Clin Res Cardiol. 2023;112(7):923-941. DOI: 10.1007/s00392-023-02177-5 External link
2.
Henkel R, Endler L, Peters A, Le Novère N, Waltemath D. Ranked retrieval of computational biology models. BMC Bioinform. 2010;11:1-12. DOI: 10.1186/1471-2105-11-423 External link
3.
Gütebier L, Henkel R, Waltemath D. Extending a COVID-19 knowledge graph with study protocols. In: 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). 2022. DocAbstr. 147. DOI: 10.3205/22gmds055 External link
4.
Riepenhausen S, Varghese J, Neuhaus P, Storck M, Meidt A, Hegselmann S, et al. Portal of Medical Data Models: Status 2018. Stud Health Technol Inform. 2019;258:239-240. DOI: 10.3233/978-1-61499-959-1-239 External link
5.
Abaza H, Shutsko A, Golebiewski M, et al. The NFDI4Health Metadata Schema (V3_3). PUBLISSO Fachrepositorium Lebenswissenschaften; 2023. DOI: 10.4126/FRL01-006472531 External link
6.
German Center for Diabetes Research. German Center for Diabetes Research (DZD) – Core Data Set. In: Medical Data Models. 2024. DOI: 10.21961/mdm:45923 External link
7.
Inau E, Sack J, Waltemath D, Zeleke A. Initiatives, Concepts, and Implementation Practices of the Findable, Accessible, Interoperable, and Reusable Data Principles in Health Data Stewardship: Scoping Review. J Med Internet Res. 2023;25:e45013. DOI: 10.2196/45013 External link