gms | German Medical Science

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH)

08.09. - 13.09.2024, Dresden

Comparison of different approaches answering medical data requests

Meeting Abstract

  • Nikita Meyer - Institut für Medizinische Informatik, Universitätsklinikum Heidelberg, Heidelberg, Germany
  • Maximilian Klass - Institut für Medizinische Informatik, Universitätsklinikum Heidelberg, Heidelberg, Germany
  • Martin Dugas - Institut für Medizinische Informatik, Universitätsklinikum Heidelberg, Heidelberg, Germany
  • Angela Merzweiler - Institut für Medizinische Informatik, Universitätsklinikum Heidelberg, Heidelberg, Germany

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH). Dresden, 08.-13.09.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 725

doi: 10.3205/24gmds164, urn:nbn:de:0183-24gmds1643

Published: September 6, 2024

© 2024 Meyer et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: The UKHD MeDIC answers requests of medical data from researchers and clinicians by data extraction from several sources. This usually takes several iterations, in which the data extraction is refined according to feedback from the requesting person. One of these requests about laboratory data was answered with different approaches regarding sources and methods. We show and explain differences in resulting datasets, and discuss limitations and ‘ease of use’ in this study.

State of the art: The UKHD MeDIC meets demands for data from clinicians by using several data sources and different approaches for extracting, transforming and loading the data (ETL processes). We currently select the combination of data source and ETL process that we consider suitable for addressing a data request. Our assessment is based on experience from previous data requests and the availability of requested data in the source systems.

Concept: Participants of a study about diabetes and their laboratory data were subject to a data request. In this data request, only laboratory values ordered by a specific organizational unit (OU) should be extracted. If there is a laboratory order on one day, then laboratory values ordered by two other OUs on the same day should also be extracted. Furthermore, the time of participation to this study was an important component of filter conditions. For this request, we perform data extraction in three different ways, regarding data source and ETL process. In addition to the data quality, we also consider the implementation effort in this study.

Implementation: The first approach extracts data directly from tables of the local clinical information system (CIS) via iterative queries. We used Talend (https://www.talend.com/) to implement these queries, to provide schema information and to do post-processing of data. The next approach uses data from an internal staging database within the MeDIC. Data originating from HL7 v2 messages is stored in the staging database for subsequent transformation to openEHR. The third approach formulates a corresponding AQL, a query for openEHR. Compared to the source (LIMS) and the data in the staging database, the data in openEHR is the most standardized and has extensive mappings, e.g. LOINC, and references to other data from the MII core dataset.

Lessons learned: Differences in datasets were expected, since data were transformed, which results in almost unavoidable data loss and different schemata. The first approach with iterating queries on CIS-tables was the slowest to build and run, because data from several tables is necessary, so subsequent queries were executed. Data from our stagingDB were collected with less implementation effort, but with missing data, since data are only available starting from a certain date, as there was no HL7 interface prior to this date. In contrast to CIS additional information like LOINC is available. With openEHR as source, data can easily be combined with other items from our openEHR repository. We will choose our source for future lab data based on the requested query period and the need to link data to other items from the openEHR repository.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.