gms | German Medical Science

67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

21.08. - 25.08.2022, online

Generating GECCO instance data in HL7 FHIR and openEHR using Synthea

Meeting Abstract

Suche in Medline nach

  • Paul Behrend - IT Center for Clinical Research, Universität zu Lübeck, Lübeck, Germany
  • Josef Ingenerf - Institute of Medical Informatics, University of Lübeck, Lübeck, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 21.-25.08.2022. Düsseldorf: German Medical Science GMS Publishing House; 2022. DocAbstr. 200

doi: 10.3205/22gmds069, urn:nbn:de:0183-22gmds0698

Veröffentlicht: 19. August 2022

© 2022 Behrend et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe



Introduction: Due to the COVID-19 pandemic and the resulting scientific activities, various initiatives produce large quantities of research data. To standardize the data collection process, the Network of University Medicine (NUM) developed the GECCO dataset for uniformly documenting COVID-19 patients [1]. Several applications and projects [2], [3], [4] already use it to gather and provide relevant data. For continuous and reliable development, an increasing demand for test data in various formats is required. Since access to real patient data is limited, a framework (SyntheaGECCO) for flexible data generation was created [5]. Consequently, this work presents a novel method for creating test data while avoiding privacy issues.

Methods: The patient data generator Synthea serves as a baseline for this project. It employs demographic data, health statistics, and clinical practice guidelines to generate realistic data in a US-American health context. For the transformation of the data created by Synthea according to the GECCO specification, the terminology server Ontoserver [6], as well the Rx-Norm API of the National Institute of Health were used [7]. The core of the GECCO specification contains 80 data points and 280 associated response options covering all relevant Information from admission to discharge of a COVID-19 case. Profilings of GECCO are available in HL7 FHIR R4 and openEHR [1].

Results: The data generated by Synthea in HL7 FHIR R4 is analyzed regarding the requirements of the GECCO specification. Relevant resources instances get extracted from the FHIR bundles using previously created mappings. Those mappings originate from the structure definitions, as well as templates of the respective profilings in HL7 FHIR and openEHR. With the help of the Ontoserver, they were expanded according to the polyhierarchy regarding the SNOMED CT codes contained. To use the medication administrations, initially coded in RxNorm, the RxcNorm API was employed. In this manner, representations conforming to GECCO could be generated using the synthetic data. The obtained GECCO data instances were validated both regarding FHIR R4 and openEHR. This process included the FHIR-Bridge [8] to examine the equivalence of output formats and use of the profiles for resource validation.

Discussion: Using the described process, most of the data elements specified in GECCO could be generated using synthetic data. Exceptions represent medical image and contact data. The validation confirmed that the final data is valid and both formats provide comparable information content. Although Synthea offers the possibility to generate extensive and detailed patient data for GECCO, some data elements couldn't be realized. Additionally, the population-based approach to data generation demands the creation of large data sets to provide adequate variety regarding patient histories. Concerning the standards HL7 FHIR and openEHR, both are equally applicable for this use case.

Conclusions: The employment of real patient data in secondary use cases, like the development of software for the health sector, is accompanied by many hurdles. Synthetic data based on statistics however avoids these disadvantages and thus represents a viable alternative. This project highlights, that existing technologies simulate sufficiently complex processes to generate enough information for specific use cases, like GECCO.

Acknowledgment: The corresponding author was supported by a scholarship of the Friedrich-Wingert-Stiftung.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


Sass J, Bartschke A, Lehne M, Essenwanger A, Rinaldi E, Rudolph S, et al. The German Corona Consensus Dataset (GECCO): a standardized dataset for COVID-19 research in university medicine and beyond. BMC Med Inform Decis Mak. 2020;20(1):341. DOI: 10.1186/s12911-020-01374-w Externer Link
Orchestra Cohort. About Orchestra. Orchestra; [Updated 2022-02-18, Accessed 2022-03-24]. Available from: Externer Link
Netzwerk Universitätsmedizin. COMPASS Steckbrief. Netzwerk Universitätsmedizin; [Updated 2020-10-04, Accessed 2022-03-24]. Available from: Externer Link
Netzwerk Universitätsmedizin. NAPKON - Nationales Pandemie Kohorten Netz. Netzwerk Universitätsmedizin; [Accessed 2022-04-14]. Available from: Externer Link
IT Center for Clinical Research - Universität zu Lübeck. Synthea-GECCO. IT Center for Clinical Research; [Accessed 2022-05-24]. Available from: Externer Link
Metke-Jimenez A, Steel J, Hansen D, Lawley M. Ontoserver: a syndicated terminology server. J Biomed Semantics. 2018;9(1):24. DOI: 10.1186/s13326-018-0191-z Externer Link
National Institute of Health. RxNorm API. National Library of Medicine; [Accessed 2022-03-24]. Available from: Externer Link
EHRbase. FHIR Bridge. GitHub; [Accessed 2022-04-14]. Available from: Externer Link