gms | German Medical Science

67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

21.08. - 25.08.2022, online

Extending a COVID-19 knowledge graph with study protocols

Meeting Abstract

Suche in Medline nach

  • Lea Gütebier - Department of Medical Informatics, University Medicine Greifswald, Greifswald, Germany
  • Ron Henkel - Department of Medical Informatics, University Medicine Greifswald, Greifswald, Germany
  • Dagmar Waltemath - Department of Medical Informatics, University Medicine Greifswald, Greifswald, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 21.-25.08.2022. Düsseldorf: German Medical Science GMS Publishing House; 2022. DocAbstr. 147

doi: 10.3205/22gmds055, urn:nbn:de:0183-22gmds0551

Veröffentlicht: 19. August 2022

© 2022 Gütebier et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe



Introduction: A systematic approach for the representation and integration of data is essential for research data recycling and knowledge gain. The integration of data sources in a graph database, especially in health and life sciences, allows for time-efficient data exploration, deduction of semantic similarities, utilization of clustering algorithms, and analysis of data structures.

In previous works we demonstrated our concept and workflow for the integration of COVID-related data into a graph database, called CovidGraph [1]. As a next step, we plan to cover information on COVID-19 study protocols.

State of the art: Graph databases are designed to represent highly-connected, heterogeneous data using attributed nodes and edges [2]. CovidGraph, developed by the HealthECCO organization, is a COVID-19 knowledge graph that can be accessed via different interfaces ( The knowledge graph combines textual data, such as publications and patents, with biomedical data, clinical trials, systems biology models, and case statistics, covering SARS-CoV-2 and the coronavirus family. However, access to structured study protocols is still missing. Interestingly, the German Study Hub NFDI4Health COVID-19 ( [3] has recently been launched to provide more than 750 fully annotated and structured study protocols in accordance with the FAIR guiding principles [4]. We believe that this data set will be a perfect addition to the existing CovidGraph.

Concept: We outline ETL-processes which include the mapping of the study protocols onto a graph structure and cross-referencing with other knowledge resources. The ETL-processes already implemented in CovidGraph Extract data from its source, Transform it according to specified mapping information and Load the data into a graph database. We will elaborate on these ETL-processes in detail in our talk. Further, we analyse the structure of the already integrated data from [5], and we propose a mapping of structures from the NFDI4Health study hub onto the CovidGraph structure. As of now, the clinical trials domain in CovidGraph is mainly connected to publications. Because the data from the NFDI4Health study hub is fully annotated, we expect new mappings between the study protocols and biomedical ontologies.

Implementation: The sophisticated architecture of the COVID-graph framework makes integration of heterogeneous domains of knowledge easy. The framework is built on a data loading pipeline defining an ETL-process for each domain. To integrate new data sources the pipeline can be modified to fit the needs of the fully annotated data set to be included. We demonstrate this process using the example of NFDI4Health study protocols. In addition, various interfaces for accessing the knowledge graph are already implemented and can easily be extended, allowing the user to intuitively query the data.

Lessons learned: We gained extensive experience with integrating systems biology models [6] into CovidGraph. Thus, we know how valuable the inclusion of a formerly inaccessible data domain is, and we embrace the possibilities that will be offered by including and mapping the study protocols in CovidGraph. The integration of study-related data will result in a domain-spanning knowledge graph that supports similarity determination and structural overlap comparison of health studies.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


Gütebier L, Henkel R, Jarasch A, Bleimehl T, Müller S, Munro J, et al. COVIDGraph: Connecting biomedical COVID-19 resources and computational biology models. In: 2nd Workshop on Search, Exploration, and Analysis in Heterogeneous Datastores (SEA-Data 2021). 2021. p. 34–37.
Robinson I, Webber J, Eifrem E. Graph Databases: new opportunities for connected data. O'Reilly Media, Inc.; 2015.
Wilkinson M, Dumontier M, Aalbersberg I, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data. 2016;3(1):1–9.
Schmidt C, Darms J, Shutsko A, Löbe M, Nagrani R, Seifert B, et al. Facilitating Study and Item Level Browsing for Clinical and Epidemiological COVID-19 Studies. Studies in Health Technology and Informatics. 2021;281:794–798.
Zarin D, Tse T, Williams R, Califf R, Ide N. The results database - update and key issues. New England Journal of Medicine. 2011;364(9):852–860.
Henkel R, Wolkenhauer O, Waltemath D. Combining computational models, semantic annotations and simulation experiments in a graph database. Database. 2015;2015:1–6.