Article
Similarity scoring as a novel approach towards reusing and combining data from existing clinical and health studies
Search Medline for
Authors
Published: | September 6, 2024 |
---|
Outline
Text
Introduction: There is an urgent need in clinical and epidemiological research to identify suitable studies for pooled data analyses and meta-analyses to achieve generalisable scientific results. However, finding studies of interest (“retrieval”) and determining the best matches among these studies for the search criteria (“ranking”) remains challenging. This work aims to establish a solid foundation for methods that (1) find clinical and epidemiological studies of interest based on a feature set and (2) identify studies that are similar to a given example study.
Methods: In April 2024 we ran a multi-disciplinary workshop with participants from the University Medicine Greifswald, the German Centre for Diabetes Research (DZD), and the German Centre for Cardiovascular Research (DZHK). Two groups of experts from epidemiology, medical informatics, biostatistics, and research data management examined publicly available information of three example studies, one group checking the DZHK research platform (https://dzhk.de/en/research/clinical-research/dzhk-studies/) [1], the other group reviewing structured entries provided at clinicaltrials.gov and EudraCT. Based on these information, each group derived features that were then discussed and harmonised in a single list of meaningful features. This feature set can be incorporated into a study similarity score. This similarity score will be incorporated in our search index [2] and, together with our study graph, provide novel functionalities for structured and efficient navigation and retrieval of clinical and health studies.
Results: We identified a concise list of 17 features for a weighted similarity score for clinical and health studies. These features include, for example, study design, eligibility criteria, participant selection criteria, and recorded parameters (such as anthropometric measures or biomarkers) following the definition in the study data dictionaries. Additionally, the list includes features covering descriptive (e.g., study title) and administrative meta-data (e.g., publication repository), which enables us to link study information to external resources.
We manually tested the applicability of our feature list in accordance with our three example studies. The identified features are concurrently being integrated in our existing graph model for study data. We expect that the insights gained from that extension will enhance the overall design and coverage of the data structures [3].
Conclusion: The presented feature list can be applied to different types of studies ranging from randomized clinical trials to complex cohort study designs. Although our feature list was primarily developed using data from clinicaltrials.gov and EudraCT, it can be adapted to other platforms, such as the Portal of Medical Data Models [4].
In the next step, we will refine the graph model [3] and implement the accompanying search methods and similarity scores in accordance to the feature list. Furthermore, we will back up these methods with supporting ontologies and identifying cross-references to meta-data items, for example from the NFDI4Health Metadata Schema [5] and the DZD Core Data Set [6].
This work marks an initial step towards developing a graph-based similarity search tool for clinical and health studies. This innovative approach shall facilitate multi-study analysis projects, improve the efficiency of study retrieval, and promote greater FAIRness (Findability, Accessibility, Interoperability, and Reusability) in health research [7].
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Hoffmann J, Hanß S, Kraus M, Schaller J, Schäfer C, Stahl D, et al. The DZHK research platform: maximisation of scientific value by enabling access to health data and biological samples collected in cardiovascular clinical studies. Clin Res Cardiol. 2023;112(7):923-941. DOI: 10.1007/s00392-023-02177-5
- 2.
- Henkel R, Endler L, Peters A, Le Novère N, Waltemath D. Ranked retrieval of computational biology models. BMC Bioinform. 2010;11:1-12. DOI: 10.1186/1471-2105-11-423
- 3.
- Gütebier L, Henkel R, Waltemath D. Extending a COVID-19 knowledge graph with study protocols. In: 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). 2022. DocAbstr. 147. DOI: 10.3205/22gmds055
- 4.
- Riepenhausen S, Varghese J, Neuhaus P, Storck M, Meidt A, Hegselmann S, et al. Portal of Medical Data Models: Status 2018. Stud Health Technol Inform. 2019;258:239-240. DOI: 10.3233/978-1-61499-959-1-239
- 5.
- Abaza H, Shutsko A, Golebiewski M, et al. The NFDI4Health Metadata Schema (V3_3). PUBLISSO Fachrepositorium Lebenswissenschaften; 2023. DOI: 10.4126/FRL01-006472531
- 6.
- German Center for Diabetes Research. German Center for Diabetes Research (DZD) – Core Data Set. In: Medical Data Models. 2024. DOI: 10.21961/mdm:45923
- 7.
- Inau E, Sack J, Waltemath D, Zeleke A. Initiatives, Concepts, and Implementation Practices of the Findable, Accessible, Interoperable, and Reusable Data Principles in Health Data Stewardship: Scoping Review. J Med Internet Res. 2023;25:e45013. DOI: 10.2196/45013