gms | German Medical Science

67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

21.08. - 25.08.2022, online

Gecco or not Gecco?

Meeting Abstract

  • Khalid Yusuf - Institute of Medical Informatics, University Medical Center Göttingen, Göttingen, Göttingen, Germany
  • Miriam Rainers - Institute of Medical Informatics, University Medical Center Göttingen, Göttingen, Göttingen, Germany
  • Sabine Hanß - Institute of Medical Informatics, University Medical Center Göttingen, Göttingen, Göttingen, Germany
  • Dagmar Krefting - Institute of Medical Informatics, University Medical Center Göttingen, Göttingen, Göttingen, Germany; Campus Institut Data Science (CIDAS), Georg August-University Goettingen, Göttingen, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 21.-25.08.2022. Düsseldorf: German Medical Science GMS Publishing House; 2022. DocAbstr. 199

doi: 10.3205/22gmds068, urn:nbn:de:0183-22gmds0689

Published: August 19, 2022

© 2022 Yusuf et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: The German Corona Consensus Dataset (GECCO) has been designed within the Network University Medicine (NUM) to offer Covid-19 researchers a dataset, taking into consideration internationally accepted terminologies as well as interoperable health IT standards [1]. Another project of the NUM is the National Pandemic Cohort Network (NAPKON) which provides the infrastructure for capturing, storing, and retrieving data from the Covid-19 disease [2]. The three NAPKON cohort studies (SÜP, HAP, POP) have agreed on the interoperability of data items with GECCO, but follow-ups introduce time-dependent values ??????not specified in GECCO, and sometimes variables are captured with greater details. In this work, we describe the methods and challenges of mapping the cohort data to GECCO.

Methods: Suitable data points from each of the 3 NAPKON studies captured in secuTrial are mapped according to the 13 concepts presented in GECCO using an R-Script [3]. While a one-to-one mapping is applied to elements that have only baseline record, a many-to-many mapping is adopted for elements with multiple time-points or variants. We recognize time-bound concepts in GECCO by factoring events that started before or after Covid-19 infection where necessary. GECCO and NAPKON experts are also consulted to resolve unclear situations.

Results: One-to-one mapping is largely possible for the POP study; however, some variables are missing, eg inclusion criteria. In SÜP, the mapping of the data points for most GECCO concepts is one-to-one. However, insulin medication, weight as well as all elements in the laboratory, and vital parameter concepts are mapped in a many-to-many relationship to four values ??with different time-points in GECCO. For some variables like heart failure in GECCO where there are two variants (acute and chronic), two-to-one mapping is maintained. The HAP study results are similar to SUP. However, multiple entries in data elements with nominal values ??????in the Therapy concept are either reduced to one (eg Dialysis is reduced to one value according to the order of preference - “Yes”, “No”, and “NA”) or reduced to the number of all possible nominal values ??where the values ??are explicitly stated (e.g. respiratory-therapy-type with nominal values ??“invasive” and “non-invasive”). A comprehensive list of the mapped variables can be accessed in the Gitlab repository of the project [3].

Discussion: Our approach extracts the GECCO dataset from suitable data-points captured in the NAPKON studies stored in secuTrial; a different approach to Cramer et al. [4]. The adoption of many-to-many mapping makes NAPKON's comparability to direct GECCO-based entry debatable. Further study is required to evaluate the impact of the mapping process on the quality of the derived data.

Conclusion: The uniformity of the successfully mapped NAPKON elements to the GECCO target shows that the goal of aligning NAPKON to the GECCO standard is on course, while there remain some gaps that may become relevant if data is federated from different sources.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Sass J, Bartschke A, Lehne M, Essenwagner A, Rinaldi E, Rudolph S, et al. The German Corona Consensus Dataset (GECCO): a standardized dataset for COVID-19 research in university medicine and beyond. BMC Medical Informatics and Decision Making. 2020;20:341. DOI: 10.1186/s12911-020-01374-w External link
2.
National Pandemic Cohort Network (NAPKON). [accessed 2022 Mar 19]. Available from: https://napkon.de/das-projekt/ External link
3.
Yusuf K, Rainers M, Hanß S, Krefting D. Medical Informatics - Public Projects / mi-num-public / NAPKON-to-Gecco-Convert. GitLab; [accessed 2022 Apr 12]. Available from: https://gitlab.gwdg.de/medinfpub/mi-num-public/napkon-to-gecco External link
4.
Cramer S, Schneider P, Dhillon C, Soto-Rey I. Die NAPKON- und GECCO-Datensätze im Vergleich. In: 66th Annual Meeting of the German Society for Medical Informatics, Biometry and Epidemiology e.V. (GMDS). 26.-30.09.2021. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 121. DOI: 10.3205/21GMDS026 External link