gms | German Medical Science

62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

17.09. - 21.09.2017, Oldenburg

CDISC ODM for data integration of clinical real world data – a bridge over troubled water

Meeting Abstract

Suche in Medline nach

  • Frank A. Meineke - IMISE/Universität Leipzig, Leipzig, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Oldenburg, 17.-21.09.2017. Düsseldorf: German Medical Science GMS Publishing House; 2017. DocAbstr. 282

doi: 10.3205/17gmds154, urn:nbn:de:0183-17gmds1547

Veröffentlicht: 29. August 2017

© 2017 Meineke.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe



Introduction: Real world data (RWD) e.g. data from registries and healthcare data, has growing importance in medical science. Although massively collected, its accessibility and usability is restricted due to legal, organizational, technical and quality reasons. This is one ot the drivers for the German medical informatics initiative and its data integration centers. RWD typically comes in proprietary formats; csv being the least common denominator. Our mission is to make this data generally better accessible, usable and archivable by enriching it with metadata, checking consistency and quality and transforming it into a standardized format to facilitate and streamline post processing as efficient, generic and quality preserving as possible.

Methods: The Operational Data Model ODM is an XML format developed by CDISC and originally intended for the transmission and archival of clinical trials data and metadata [1]. Define-xml is based on ODM - it is the state of the art transmission format for submitting metadata to the FDA. But ODM is not restricted to trial data, although its supports modelling audit or administrative data as well. Large collections of clinical request forms and metadata is based on ODM [2].

Results: Efficient and high quality generation of ODM from heterogeneous sources requires specific software tools. We have developed software to support upgrading RWD sources to ODM. For the generation of ODM from complex database systems (older and internally developed clinical data management systems) we developed a high level ODM library. For transformation of less complex, rather tabular data, we developed a map-file based generic importer. The expressiveness is sufficient to reconstruct the structure (Events/Forms/Groups) of formerly structured and then flattened data.

?The data is enriched by metadata like code lists and labels, measurements units, descriptions, missing values, ignorable items, grouping (Event/Form/Group) information. Data types and format are checked (e.g. dates), organizational date is tagged (patient-id, visits-times, repetitions). Metadata is defined in a simple editable spreadsheet - suitable for exchange in negotiations with data producers.

The ODM postprocessing tools allows to load data into i2b2, create annotated data files (e.g. SPSS, R) and reports or generate a relational database structure (e.g. proprietary research database). The complete pipeline is batch driven and controlled by a xml based project files. All data is streamed, thus ODM file size is not memory limited. It takes 30min to load six years of diagnoses/procedures data (>1GB XML, 10 million facts) from Leipzig university hospital into i2b2 using a modest two core virtual machine [3].

Discussion: Although standards like CDA (Level 3) or FHIR resources are set for the future for EHR data storage, RWD and statistical analysis is mostly based on aggregated, cohort data; we learnd from clinical trials that CDISC ODM is a ripe and stable standard here. Consequently, generated ODM metadata could be uploaded to [2], to foster a clinical RWD metadata repository. Well managed, ODM may serve as a bridge from clinicians to researchers over the troubled water of heterogeneous data sources back and moving targets.

Presented software will be made open source after presentation.

Acknowledgements: This work was supported by the Federal Ministry of Education and Research (BMBF), Germany, Integrated Research and Treatment Center Adiposity Diseases FKZ 01EO1001 and i:DSem Integrative Data Semantics in Systems Medicine FKZ 01EO1001.

Die Autoren geben an, dass kein Interessenkonflikt besteht.

Die Autoren geben an, dass kein Ethikvotum erforderlich ist.


CDISC. Operational Data Model: Version 1.3.2. 2014. Available from: URL: Externer Link
Dugas M, Neuhaus P, Meidt A, Doods J, Storck M, Bruland P, et al. Portal of medical data models: information infrastructure for medical research and healthcare. Database (Oxford). 2016;2016.
Meineke FA, Stäubert S, Löbe M, Winter A. A comprehensive clinical research database based on CDISC ODM and i2b2. Stud Health Technol Inform. 2014;205:1115-9.