gms | German Medical Science

64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

08. - 11.09.2019, Dortmund

Creation of semantic core datasets in the Portal of Medical Data Models – How to benefit from pooled expertise

Meeting Abstract

  • Cornelia Mertens - Westfälische Wilhelms-Universität Münster, Institut für Medizinische Informatik, Münster, Germany
  • Christian Holz - Westfälische Wilhelms-Universität Münster, Institut für Medizinische Informatik, Münster, Germany
  • Markus Kentgen - Westfälische Wilhelms-Universität Münster, Institut für Medizinische Informatik, Münster, Germany
  • Sarah Riepenhausen - Westfälische Wilhelms-Universität Münster, Institut für Medizinische Informatik, Münster, Germany
  • Philipp Neuhaus - Westfälische Wilhelms-Universität Münster, Institut für Medizinische Informatik, Münster, Germany
  • Stefan Hegselmann - Westfälische Wilhelms-Universität Münster, Institut für Medizinische Informatik, Münster, Germany
  • Alexandra Meidt - Westfälische Wilhelms-Universität Münster, Institut für Medizinische Informatik, Münster, Germany
  • Martin Dugas - Westfälische Wilhelms-Universität Münster, Institut für Medizinische Informatik, Münster, Germany
  • Julian Varghese - Westfälische Wilhelms-Universität Münster, Institut für Medizinische Informatik, Münster, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Dortmund, 08.-11.09.2019. Düsseldorf: German Medical Science GMS Publishing House; 2019. DocAbstr. 18

doi: 10.3205/19gmds087, urn:nbn:de:0183-19gmds0878

Veröffentlicht: 6. September 2019

© 2019 Mertens et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: Documentation in medicine is diverse and often inconsistent. Due to the lack of clinical data harmonization, both cross-institutional electronic data exchange and future meta-analyses are hampered [1]. Challenges arise in building the knowledge required for evidence based documentation practice. Generating disease-specific core data sets can be a key to higher efficiency in data collection by benefiting from pooled expertise. Additionally it can save time and resources [2].

Methods: The portal of medical data models (MDM-Portal) is a meta-data registry for creating, analyzing, sharing and reusing medical forms [3]. In context of two in-house doctoral theses, the development of semantic core data sets in the MDM-Portal was investigated for two disease entities: Acute Myeloid Leukemia (AML) and Acute Coronary Syndrome (ACS). To this end, documentation forms from different sources (e.g. registries, studies, quality assurance, official guidelines and routine documentation) were semantically annotated with Unified Medical Language System (UMLS) codes [4], which include various source vocabularies. Subsequently, they were analyzed for concept overlaps and most frequent concepts using the Common Data Elements Generator (CDEGenerator) [5]. Most frequent concepts were then implemented as data elements in the standard format Operational Data Model (ODM) by the Clinical Data Interchange Standards Consortium (CDISC) [6].

Results: Preliminary results show: for AML, 3265 medical concepts were semantically annotated of which 1414 were unique. The 50 most frequent unique medical concepts cover 27.0% of all concept occurrences within the collected AML documentation sources. For ACS, 3710 medical concepts were semantically annotated using 842 unique concepts. The 60 most frequent unique medical concepts cover 50% of all concept occurrences in all sources. Core data sets of both disease entities have been implemented in the CDISC Operational Data Model and are available in several other data export formats including HL-7 FHIR questionnaires or REDCap [7], [8].

Discussion: The high number of form items described by relatively few common UMLS codes indicates the existence of a core data set with relevant concepts appearing in many forms, especially for ACS. This suggests that developing disease-specific core data sets is feasible and may be particularly useful for cross-institutional data exchange in clinical routine, quality management and research. It can serve as foundation for harmonized and efficient data collection and secondary use. One challenge in semantic annotation was that there are many similar UMLS codes for one concept. We addressed this problem by using the most specific codes on the one hand, and by aiming for a uniform coding by reusing codes already assigned in the portal [9], on the other hand. However, one important limitation in the creation of core data sets is the lack of transparency and open access to (empty) documentation forms in real practice [10]. This must be worked on in order to improve processes in research together.

(Cornelia Mertens, Christian Holz and Markus Kentgen contributed equally to this work.)

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Krumm R, Semjonow A, Tio J, Duhme H, Bürkle T, Haier J, Dugas M, Breil B. The need for harmonized structured documentation and chances of secondary use–Results of a systematic analysis with automated form comparison for prostate and breast cancer. Journal of biomedical informatics. 2014 Oct 1;51:86-99.
2.
Sheehan J, Hirschfeld S, Foster E, Ghitza U, Goetz K, Karpinski J, Lang L, Moser RP, Odenkirchen J, Reeves D, Rubinstein Y. Improving the value of clinical research through the use of Common Data Elements. Clinical Trials. 2016 Dec;13(6):671-6.
3.
Dugas M, Neuhaus P, Meidt A, Doods J, Storck M, Bruland P, Varghese J. Portal of medical data models: information infrastructure for medical research and healthcare. Database (Oxford). 2016 Feb 11;2016. pii: bav121.
4.
UMLS Terminology Services. [Accessed 2019 Jul 15]. Available from: https://uts.nlm.nih.gov/home.html Externer Link
5.
Varghese J, Fujarski M, Hegselmann S, Neuhaus P, Dugas M. CDEGenerator: an online platform to learn from existing data models to build model registries. Clinical epidemiology. 2018;10:961.
6.
CDISC. Data Exchange Standards. [Accessed 2019 Jul 15]. Available from: https://www.cdisc.org/standards/data-exchange/odm Externer Link
7.
The Medical Data Models Portal. Common Data Elements for Acute Myeloid Leukemia. [Accessed 2019 Jan 30]. Available from: https://medical-data-models.org/31429 Externer Link
8.
The Medical Data Models Portal. Acute Coronary Syndrome Common Data Elements. [Accessed 2019 Jan 30]. Available from: https://medical-data-models.org/32729 Externer Link
9.
Dugas M, Meidt A, Neuhaus P, Storck M, Varghese J. ODMedit: uniform semantic annotation for data integration in medicine based on a public metadata repository. BMC medical research methodology. 2016 Dec;16(1):65.
10.
Dugas M, Jöckel KH, Friede T, Gefeller O, Kieser M, Marschollek M, Ammenwerth E, Röhrig R, Knaup-Gregori P, Prokosch HU. Memorandum “open metadata”. Methods of information in medicine. 2015;54(04):376-8.