gms | German Medical Science

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH)

08.09. - 13.09.2024, Dresden

Transformation of Synthetic Datasets to the Medical Informatics Initiative Core Dataset: A Pilot Study

Meeting Abstract

Search Medline for

  • Rajesh Murali - Medical Center for Information and Communication Technology, University Hospital Erlangen, Erlangen, Germany
  • Detlef Kraska - Medical Center for Information and Communication Technology, University Hospital Erlangen, Erlangen, Germany
  • Thomas Ganslandt - Medical Center for Information and Communication Technology, University Hospital Erlangen, Erlangen, Germany; Friedrich-Alexander-Unviersität Erlangen-Nürnberg, Institute for Medical Informatics, Biometrics and Epidemiology, Medical Informatics, Erlangen, Erlangen, Germany

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH). Dresden, 08.-13.09.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 786

doi: 10.3205/24gmds150, urn:nbn:de:0183-24gmds1502

Published: September 6, 2024

© 2024 Murali et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: Synthetic data generators like Synthea™ [1] provide simple access to anonymous, medically plausible EHR-like data for research or training purposes. However, the data generated by Synthea™ are not compliant to the Core Dataset (CDS) FHIR specifications of the German Medical Informatics Initiative (MII). Transforming them to MII-compliant FHIR formats would enable their use in the MII e.g., for testing and teaching purposes. Thus, the goal of this project is to transform Synthea™ FHIR resources to MII-compliant formats. This includes both the mapping of international terminologies to national valuesets as well as the structural adaptation of attributes.

Methods: FHIR profiles generated by Synthea™ were analyzed and compared with MII CDS FHIR profiles to derive the required changes of terminologies and structures to match MII specifications. For procedures, the relevant SNOMED codes were extracted from the Synthea™ disease models, and the SNOMED/OPS301 mapping table provided by Averbis and TriNetX LLC [2] was used. For condition SNOMED codes, the mapping from OHDSI ATHENA [3] to ICD10GM was used. A Python script was implemented to parse FHIR JSON bundles exported from Synthea™, apply the terminology mappings and add or remove attributes to match MII CDS specifications.

Results: Synthetic dataset bundles generated by Synthea™ had 16 resourcetypes, of which only 7 currently match to MII CDS profiles (see Figure 1 [Fig. 1]). These resources (Condition, DiagnosticReport, Encounter, MedicationRequest, Observation, Patient,Procedure) had 28 valuesets of which only 2 were identical to that of MII CDS. 4 valuesets were successfully mapped and 24 still have to be mapped. A total of 14 attributes that were not present in MII CDS were removed, 5 attributes were converted (Patient.gender, Encounter.class, Encounter.status, Encounter.serviceType, Encounter.location), and 3 CDS-specific attributes were added. Out of 791 SNOMED procedure codes extracted from the disease modules, 123 could be mapped, and out of 358 SNOMED conditions could be mapped to 312 ICD10GM codes.

Discussion: We successfully transformed synthetic data from Synthea™ to MII CDS specifications. We leveraged existing mappings from SNOMED to OPS301 and to ICD10GM. The approach was based on adaptation of generated FHIR bundles, and did not require changes to the Synthea™ codebase or disease models. Limitations: out of 16 FHIR resourcetypes generated, only 7 currently are available through the MII CDS, so 9 had to be dropped from the dataset. ICD and OPS terminology mappings were incomplete, and no validation by domain experts was carried out. Further analysis of the mapping quality and relevance of the unmapped codes is necessary. Even though only a partial mapping could be achieved in this pilot study, the approach required limited effort and could easily be extend to fully cover the resourcetypes currently specified in the MII CDS. Adaptation of Synthea™ to German population statistics [4] was not yet implemented in this pilot study.

Conclusion: A mapping approach to transform synthetic patient datasets to MII CDS specifications could be successfully demonstrated. After completion of the mappings, the approach lowers the barrier to use these datasets for testing and teaching purposes in the MII.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Wlonoski J, Kramer M, Nichols J, Quina A, Moesel C, Hall D et al. Synthea: An approach, method, and software mechanism for generating synthetic patients and the synthetic electronic health care record. J Am Med Inform Assoc. 2018 Mar 1;25(3):230-238. DOI: 10.1093/jamia/ocx079 External link
2.
TriNetX Mappings. TriNetX LLC; [cited 2024 Mar 04]. Available from: https://staging.trinetx.com/. External link
3.
Athena – OHDSI Vocabularies Repository. [cited 2024 Apr 27]. Available from: http://athena.ohdsi.org External link
4.
DESTATIS Federal Statistical Office. GENESIS. [cited 2024 Apr 27]. Available from: https://www-genesis.destatis.de/genesis/online External link