gms | German Medical Science

67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

21.08. - 25.08.2022, online

Distributed Data Quality Assessment Across CORD-MI Consortia

Meeting Abstract

  • Kais Tahar - Institute of Medical Informatics, University Medical Center Göttingen, Göttingen, Göttingen, Germany
  • Tamara Martin - Centre for Rare Diseases, University Hospital Tübingen, Tübingen, Tübingen, Germany
  • Yougli Mou - Chair of Computer Science 5, RWTH Aachen University, Aachen, Aachen, Germany
  • Muhammad Adnan - Cologne Excellence Cluster on Cellular Stress Responses in Aging-Associated Diseases (CECAD), Faculty of Medicine and University Hospital of Cologne, University of Cologne, Köln, Germany
  • Sarah Geihs - Division of IT Management, University Hospital Aachen, Aachen, Aachen, Germany
  • Holm Graessner - Centre for Rare Diseases, University Hospital Tübingen, Tübingen, Tübingen, Germany
  • Dagmar Krefting - Institute of Medical Informatics, University Medical Center Göttingen, Göttingen, Göttingen, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 21.-25.08.2022. Düsseldorf: German Medical Science GMS Publishing House; 2022. DocAbstr. 103

doi: 10.3205/22gmds116, urn:nbn:de:0183-22gmds1166

Published: August 19, 2022

© 2022 Tahar et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: As a result of digitization, the importance of data quality (DQ) is increasing more and more, especially in the field of medicine and science [1], [2]. In the research project “Collaboration on Rare Diseases” of the Medical Informatics Initiative (CORD-MI, FKZ 01ZZ1911R), we developed a framework for assisting the quality of data recorded in different health information systems (HISs) to improve the quality of rare disease (RD) documentation and to support clinical research.

Methods: Effective management of DQ in CORD-MI requires appropriate metrics and tools to assess the quality of data extracted from different HISs. In the literature, there is currently no consensus or standard framework for assessing DQ [3], [4], [5], [6]. The definitions of indicators reflect the individual requirements of implemented use cases. For CORD-MI, the following dimensions are required: completeness, plausibility, uniqueness and concordance. From these dimensions appropriate metrics were derived such as orphaCoding completeness, orphaCoding plausibility and uniqueness rate of RD cases [7]. Our data quality framework is implemented as an R package that provides methods for calculating DQ metrics and creating reporting scripts [7]. Using this package, we developed tools for local and cross-institutional analysis of DQ in CORD-MI. To enable harmonized DQ assessments a FHIR interface was developed and applied to the MII core data set [8] stored in the FHIR servers of heterogeneous HISs. Hence, the application of our methods is not limited to local data models or HIS architectures. Essentially, a distributed DQ analysis was implemented using Personal Health Train (PHT) [9], [10], [11]. We have also used the Alpha-ID-SE terminology [12] for detecting RD cases with tracer diagnoses [13] and assessing the quality of RD documentation.

Results: Our methodology provides a framework that enables a harmonized DQ assessment and cross-site reporting. Using PHT, the developed tools were successfully applied on synthetic data located at the University Medical Center Göttingen, University Hospital Aachen and University Hospital Cologne. The analyzed data include data items that capture information about patients, cases and diagnoses as specified in [14]. The test results indicate the correctness of calculated indicators and key numbers. The developed tools and generated reports can be downloaded from GitHub [15].

Discussion: Our approach yields promising results, which can be used for local and cross-institutional DQ assessments. The fact that our framework is model-independent enables us an easy update and change of used references. Another advantage of our framework is the modular design that allows the user to select desired key numbers and indicators as well as to generate user-defined DQ reports. Since only calculated DQ metrics leave local HISs, our approach provides useful methods for privacy-preserving assessment of DQ.

Conclusion: The framework developed provides valuable DQ metrics and an interoperable solution for evaluating the DQ in distributed data sources of CORD-MI. This study has demonstrated that our methodology can detect DQ issues such as outliers or ambiguity and implausibility of coded diagnoses and that it can be applied for cross-site reporting on DQ. Our approach can therefore be used for DQ benchmarking to improve the quality of RD documentation.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Rat für Informationsinfrastrukturen (RfII). The Data Quality Challenge – February 2020 [Internet]. [cited 2022 Mar 16]. Available from: https://rfii.de/download/the-data-quality-challenge-february-2020/ External link
2.
Hogan WR, Wagner MM. Accuracy of data in computer-based patient records. J Am Med Inform Assoc. 1997 Oct;4(5).
3.
Ramasamy A, Chowdhury S. Big Data Quality Dimensions: A Systematic Literature Review. Journal of Information Systems and Technology Management. 2020 May 20;17:e202017003.
4.
Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. Journal of the American Medical Informatics Association. 2013 Jan 1;20(1).
5.
Weng C. Clinical data quality: a data life cycle perspective. Biostatistics & Epidemiology. 2020 Jan 1;4(1).
6.
Schmidt CO, Struckmann S, Enzenbach C, Reineke A, Stausberg J, Damerow S, et al. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC Medical Research Methodology. 2021 Apr 2;21(1):63.
7.
Tahar K. dqLib [Internet]. 2021 [cited 2022 May 22]. DOI: 21.11101/0000-0007-E7D8-1 External link
8.
The Medical Informatics Initiative’s core data set [Internet]. [cited 2022 Mar 17]. Available from: https://www.medizininformatik-initiative.de/en/medical-informatics-initiatives-core-data-set External link
9.
Mou Y, Welten S, Jaberansary M, Ucer Yediel Y, Kirsten T, Decker S, et al. Distributed Skin Lesion Analysis Across Decentralised Data Sources. Stud Health Technol Inform. 2021 May 27;281:352-356.
10.
Beyan O, Choudhury A, van Soest J, Kohlbacher O, Zimmermann L, Stenzhorn H, et al. Distributed Analytics on Sensitive Medical Data: The Personal Health Train. Data Intelligence. 2020 Jan 1;2(1–2).
11.
Welten S, Mou Y, Neumann L, Jaberansary M, Ucer YY, Kirsten T, et al. A Privacy-Preserving Distributed Analytics Platform for Health Care Data. Methods Inf Med. 2022 Jan 17.
12.
BfArM. Alpha-ID-SE [Internet]. [cited 2022 May 23]. Available from: https://www.bfarm.de/EN/Code-systems/Terminologies/Alpha-ID-SE/_node.html External link
13.
List of Tracer Diagnoses extracted from Alpha-ID-SE Terminology [Internet]. 2022 [cited 2022 May 24]. DOI: 21.11101/0000-0007-F6DF-9 External link
14.
Medical Informatics Initiative - CORD - ImplementationGuide [Internet]. [cited 2022 May 23]. Available from: https://simplifier.net/guide/medicalinformaticsinitiative-cord-implementationguide?version=current External link
15.
Tahar K. cordDqChecker [Internet]. Medizininformatik Initiative; 2021 [cited 2022 Mar 16]. DOI: 21.11101/0000-0007-EC87-7 External link