gms | German Medical Science

66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

26. - 30.09.2021, online

A quality management concept for data management in data integration centers

Meeting Abstract

Search Medline for

  • Erik Tute - Medizinische Hochschule Hannover, Hannover, Germany
  • Matthias Gietzelt - Medizinische Hochschule Hannover, Hannover, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 26.-30.09.2021. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 134

doi: 10.3205/21gmds121, urn:nbn:de:0183-21gmds1216

Published: September 24, 2021

© 2021 Tute et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: As part of the Medical Informatics Initiative [1] university hospitals all over Germany establish data integration centers (DIC). A main task of a DIC is the integration of data from heterogeneous sources to make it available for reuse, e.g. for medical research, while ensuring data quality and data protection, where data quality is defined as the suitability for a given task. This contribution presents our concept for ensuring quality of data management processes in a DIC and first experiences of its application.

State of the Art: As DICs are a new infrastructure type, there are no established concepts for quality management specifically for DICs. Thus, we relied on literature from related areas, e.g. software development [2], data warehousing [3], studies and registers [4] and distributed research networks [5].

Concept: The concept defines requirements for the implementation process of data integration pipelines. These cover aspects to include in documentation, tools for documentation (Git), metadata to consider in the integration process (e.g. source system, original data etc.), suggestions for test case implementation, peer-reviews of implementation results, logging as well as the use of dedicated environments for development, testing and production. We integrate data into openEHR data repositories. These already enforce compliance with data models including constraints for valid data [6]. For further analysis, e.g. to expose uncommon but not invalid data or suspicious distributions, the concept uses data quality assessment based on collaboratively governed data quality indicators for different domains (e.g. cardiology or oncology) and perspectives (e.g. checking data integration pipeline vs. preparing a data analysis with certain statistical methods in mind) [7]. It is explicitly intended to link the definition of data quality indicators as close as possible with activities on consortia (HiGHmed AG Datenqualität and use cases) and national level (MI Initiative Taskforce Metadaten). Source data verification for small amounts of data complete the concept regarding result quality. Communication processes for data quality issues use Git issues for written documentation and issue handling. Our concept defines responsibilities for issue handling and suggests actively requesting feedback from data users to become aware of problems.

Implementation and Lessons Learned: Since quality management for DICs is new as DICs themselves, the concept is still frequently subject to change based on new insights. The written form of our concept consists of a document part curated in SharePoint and parts in a local GitLab instance, depending on the purpose of the contents. Staff implementing data integration pipelines or curating data (data stewards) know the concept, have access to its current version and contribute to its evolution. Tools described in the concept, e.g. Git for documentation or openCQA for data quality assessment, are available and tested in first use cases. Links to consortia and national working groups were initiated. Implementation now entered the crucial phase of actually living the concept and keeping the compliance up over time. First experiences of engaging staff to align their work with the concept and to contribute own experiences for its improvement showed a (surprisingly promising) high awareness and willingness.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Semler SC, Wissing F, Heyder R. German medical informatics initiative. Methods Inf Med. 2018;57:e50–6. DOI: 10.3414/ME18-03-0003 External link
2.
Spillner A, Linz T. Basiswissen Softwaretest. 4th ed. Heidelberg: dpunkt Verlag; 2002.
3.
Heinrich B, Kaiser M, Klier M. How to measure Data Quality? A Metric-based Approach. In: Proceedings of the 28th International Conference on Information Systems (ICIS). Montreal, Queen's University; 2007.
4.
Nonnemacher M, Nasseh D, Stausberg J. Datenqualität in der medizinischen Forschung. Leitlinie zum adaptiven Management von Datenqualität in Kohortenstudien und Registern. 2nd ed. Berlin: MWV Medizinisch Wissenschaftliche Verlagsgesellschaft; 2014. (TMF-Schriftenreihe; 4).
5.
Khare R, Utidjian LH, Razzaghi H, Soucek V, Burrows E, Eckrich D. Design and Refinement of a Data Quality Assessment Workflow for a Large Pediatric Research Network. EGEMS (Wash DC). 2019;7:36. DOI: 10.5334/egems.294 External link
6.
Tute E, Wulff A, Marschollek M, Gietzelt M. Clinical information model based data quality checks: theory and example. Stud Health Technol Inform. 2019;258:80–4.
7.
Tute E, Scheffner I, Marschollek M. A method for interoperable knowledge-based data quality assessment. BMC Med Inform Decis Mak. 2021 Mar 9;21(1):93. DOI: 10.1186/s12911-021-01458-1 External link