Article
Knowledge graph representation of clinical research studies
Search Medline for
Authors
| Published: | September 15, 2023 |
|---|
Outline
Text
Introduction: The increasing volume of data generated by clinical research studies presents challenges in managing, analyzing, and extracting meaningful insights. Moreover, comparing results across studies can be difficult due to the complexities involved. To address these issues, this project explores the use of knowledge graph (KG) representations as an alternative to relational databases for more effective data comparison and analysis. KGs have emerged as a novel research area in healthcare, characterized by a relatively limited number of publications [1]. However, notable approaches have demonstrated that visualizing temporal structures in KGs yields superior results, enabling faster record retrieval compared to traditional relational databases [2].
In the field of medical research, the Observational Medical Outcome Partnership Common Data Model (OMOP CDM) has already been established as a standardized and comparable tool for representing study data. However, it is constrained by its lack of flexibility and inability to represent certain types of information [3]. To address this limitation, we propose the development of a KG based on the OMOP CDM, aiming to enhance its capabilities.
The objectives of this work are (1) developing and evaluating a KG representation of longitudinal clinical research studies, (2) integrating data from two studies of the German Center for Diabetes Research (DZD) into an existing knowledge graph, and (3) representing complex time structures typical for clinical studies in the KG.
Methods: After a requirement analysis including literature review and feedback from study heads, we design a KG model with Neo4j. Starting with a KG representation for the OMOP CDM based on the approach of Kang [4] and concluded by the evaluation of its completeness. We extend the graph data model with features not stored in OMOP CDM (e.g temporal structures) and therefore design a metadata model for timing of events in clinical research studies based on existing standards (CDISC) and implement a data model for visits. The KG will be integrated into DZDconnect [5], a biomedical knowledge graph that links basic research and clinical study data of the DZD with external knowledge. The selected studies have already been mapped to the OMOP CDM within a relational database. Additionally, this project emphasizes the importance of time sequences in long-term studies to ensure comparability of results between participants. By comparing planned interventions (visits) with those that actually occurred, we aim to better identify and manage unplanned events, ultimately leading to more scientifically robust findings.
Results and discussion: Through that research project we assume to obtain a more detailed and performant representation and comparability of clinical studies. This will help to analyze the relationships between planned and actually taken place visits more easily leading to a more efficient use of clinical studies to improve the healthcare system. Furthermore, it should be possible to integrate several studies and therefore facilitate cross-study analysis.
Drawing from interviews conducted with experts in medical studies, it is crucial to explore the associations between patient visits and seasonal variations (vacation time, christmas, cold) such as the decrease in visits during summer season due to vacations. Understanding these relationships is of importance in order to optimize future study planning. By incorporating the temporal component and employing visualizations, it becomes feasible to derive meaningful sub-studies from the data and enable effective comparison with other studies. It is important to note that the research project is currently under development, and the results will be made available in time for presentation at the GMDS congress. But based on the feedback from experts, we think that there is high potential in KG approaches to clinical data that needs to be further investigated.
Conclusion: The comparability of various clinical studies is challenging. This research project should show how data from different clinical research studies can be better analyzed and compared by a KG representation with focus on the temporal structure of planned visits compared to actually conducted ones.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Chen Z, Peng B, Ioannidis VN, Li M, Karypis G, Ning X. A knowledge graph of clinical trials (CTKG). Scientific Reports. 2022;12:47241. DOI: 10.1038/s41598-022-08454-z
- 2.
- Dasgupta SS, Ray SN, Talukdar P. HyTE: Hyperplane-based Temporally aware Knowledge Graph Embedding. In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. Brussels, Belgium: Association for Computational Linguistics; 2018.
- 3.
- Reinecke I, Zoch M, Reich C, Sedlmayr M, Bathelt F. The Usage of OHDSI OMOP - A Scoping Review. Stud Health Technol Inform. 2021 Sep 21;283:95–103.
- 4.
- Kang M, Alvarado-Guzman JA, Rasmussen L, Starren JB. Development of a Graph Model for the OMOP Common Data Model [Abstract]. DigitalHub, Galter Health Sciences Library & Learning Center; 2022.
- 5.
- Dedié A, Bleimehl T, Täger J, Preusse M, Hrabe de Angelis M, Jarasch A. DZDconnect: mit vernetzten Daten gegen Diabetes. Diabetologe. 2021;17:780-787. DOI: 10.1007/s11428-021-00807-y
