gms | German Medical Science

67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

21.08. - 25.08.2022, online

Generating GECCO instance data in HL7 FHIR and openEHR using Synthea

Meeting Abstract

Search Medline for

  • Paul Behrend - IT Center for Clinical Research, Universität zu Lübeck, Lübeck, Germany
  • Josef Ingenerf - Institute of Medical Informatics, University of Lübeck, Lübeck, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 21.-25.08.2022. Düsseldorf: German Medical Science GMS Publishing House; 2022. DocAbstr. 200

doi: 10.3205/22gmds069, urn:nbn:de:0183-22gmds0698

Published: August 19, 2022

© 2022 Behrend et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: Due to the COVID-19 pandemic and the resulting scientific activities, various initiatives produce large quantities of research data. To standardize the data collection process, the Network of University Medicine (NUM) developed the GECCO dataset for uniformly documenting COVID-19 patients [1]. Several applications and projects [2], [3], [4] already use it to gather and provide relevant data. For continuous and reliable development, an increasing demand for test data in various formats is required. Since access to real patient data is limited, a framework (SyntheaGECCO) for flexible data generation was created [5]. Consequently, this work presents a novel method for creating test data while avoiding privacy issues.

Methods: The patient data generator Synthea serves as a baseline for this project. It employs demographic data, health statistics, and clinical practice guidelines to generate realistic data in a US-American health context. For the transformation of the data created by Synthea according to the GECCO specification, the terminology server Ontoserver [6], as well the Rx-Norm API of the National Institute of Health were used [7]. The core of the GECCO specification contains 80 data points and 280 associated response options covering all relevant Information from admission to discharge of a COVID-19 case. Profilings of GECCO are available in HL7 FHIR R4 and openEHR [1].

Results: The data generated by Synthea in HL7 FHIR R4 is analyzed regarding the requirements of the GECCO specification. Relevant resources instances get extracted from the FHIR bundles using previously created mappings. Those mappings originate from the structure definitions, as well as templates of the respective profilings in HL7 FHIR and openEHR. With the help of the Ontoserver, they were expanded according to the polyhierarchy regarding the SNOMED CT codes contained. To use the medication administrations, initially coded in RxNorm, the RxcNorm API was employed. In this manner, representations conforming to GECCO could be generated using the synthetic data. The obtained GECCO data instances were validated both regarding FHIR R4 and openEHR. This process included the FHIR-Bridge [8] to examine the equivalence of output formats and use of the profiles for resource validation.

Discussion: Using the described process, most of the data elements specified in GECCO could be generated using synthetic data. Exceptions represent medical image and contact data. The validation confirmed that the final data is valid and both formats provide comparable information content. Although Synthea offers the possibility to generate extensive and detailed patient data for GECCO, some data elements couldn't be realized. Additionally, the population-based approach to data generation demands the creation of large data sets to provide adequate variety regarding patient histories. Concerning the standards HL7 FHIR and openEHR, both are equally applicable for this use case.

Conclusions: The employment of real patient data in secondary use cases, like the development of software for the health sector, is accompanied by many hurdles. Synthetic data based on statistics however avoids these disadvantages and thus represents a viable alternative. This project highlights, that existing technologies simulate sufficiently complex processes to generate enough information for specific use cases, like GECCO.

Acknowledgment: The corresponding author was supported by a scholarship of the Friedrich-Wingert-Stiftung.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Sass J, Bartschke A, Lehne M, Essenwanger A, Rinaldi E, Rudolph S, et al. The German Corona Consensus Dataset (GECCO): a standardized dataset for COVID-19 research in university medicine and beyond. BMC Med Inform Decis Mak. 2020;20(1):341. DOI: 10.1186/s12911-020-01374-w External link
2.
Orchestra Cohort. About Orchestra. Orchestra; [Updated 2022-02-18, Accessed 2022-03-24]. Available from: https://orchestra-cohort.eu/. External link
3.
Netzwerk Universitätsmedizin. COMPASS Steckbrief. Netzwerk Universitätsmedizin; [Updated 2020-10-04, Accessed 2022-03-24]. Available from: https://num-compass.science/de/compass/steckbrief/ External link
4.
Netzwerk Universitätsmedizin. NAPKON - Nationales Pandemie Kohorten Netz. Netzwerk Universitätsmedizin; [Accessed 2022-04-14]. Available from: https://napkon.de/ External link
5.
IT Center for Clinical Research - Universität zu Lübeck. Synthea-GECCO. IT Center for Clinical Research; [Accessed 2022-05-24]. Available from: https://github.com/itcr-uni-luebeck/Synthea-Gecco External link
6.
Metke-Jimenez A, Steel J, Hansen D, Lawley M. Ontoserver: a syndicated terminology server. J Biomed Semantics. 2018;9(1):24. DOI: 10.1186/s13326-018-0191-z External link
7.
National Institute of Health. RxNorm API. National Library of Medicine; [Accessed 2022-03-24]. Available from: https://lhncbc.nlm.nih.gov/RxNav/APIs/RxNormAPIs.html External link
8.
EHRbase. FHIR Bridge. GitHub; [Accessed 2022-04-14]. Available from: https://github.com/ehrbase/ehrbase External link