gms | German Medical Science

66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

26. - 30.09.2021, online

Data Quality in Secondary Use: Adopting Concepts from Continous Risk Management – A Discussion Paper

Meeting Abstract

  • Jonas Bienzeisler - Medical Faculty, RWTH University, Aachen, Germany
  • Lucas Triefenbach - RWTH University, Aachen, Germany
  • Alexander Kombeiz - Institut für medizinische Informatik, RWTH University, Aachen, Germany
  • Myriam Lipprandt - Medical Faculty, RWTH University, Aachen, Germany
  • Raphael W. Majeed - Medical Faculty, RWTH University, Aachen, Germany
  • Hauke Fischer - University of Oldenburg, Oldenburg, Germany
  • Ronny Otto - Universitätsklinikum Magdeburg A.ö.R., Magdeburg, Germany
  • Atinkut Alamirrew Zeleke - Universität Greifswald, Greifswald, Germany
  • Dagmar Waltemath - Universitätsmedizin Greifswald, Greifswald, Germany
  • Rainer Röhrig - Medical Faculty, RWTH University, Aachen, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 26.-30.09.2021. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 43

doi: 10.3205/21gmds068, urn:nbn:de:0183-21gmds0685

Published: September 24, 2021

© 2021 Bienzeisler et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at



Introduction: Electronic health records (EHR) document medical treatment. The trustworthiness of the underlying data is one of the prerequisites for successful data and information reuse [1], [2], [3]. As for all secondary data sources, the quality of data is constantly fluctuating with regard to common quality metrics [4].

Experts suggest a flexible and parallel data quality management approach to handle the refinement and continuous transformation of EHR data ecosystems. Similarly to conventional quality management, data quality has to be both assured and controlled continuously. Proactive measures such as vigilance systems may reduce the risk of faulty data due to adverse incidents. While general frameworks for data quality exist, there are no solutions for a continuous data quality management incorporating these basic concepts from both quality and risk management.

State of the art: The fitness for scientific purposes of data has to be individually evaluated [5], [6]. Weiskopf and Weng [4] concluded that systematic and validated methods are necessary in order to access the quality of EHR data if reused in clinical research. Data can accordingly be evaluated using seven categories of data quality assessment methods and five common dimensions of data quality, namely Completeness, Correctness, Concordance, Plausibility and Currency. Kahn et al. [6] developed these dimensions into a common framework for evaluation and communication of data quality findings in a pre-defined manner and using a shared vocabulary describing taxonomies and models of data quality.

Concept: We believe that a continuous data quality management will improve EHR data quality. We propose that a systematic risk management can be implemented by adopting the HIT risk management concept from the IMIA Working Group for Health Informatics for Patient Safety [7]. We suggest to focus data quality management on the data generating medical process, which needs to be continuously monitored during data quality monitoring. An altered data generating process or a critical finding identified by the monitoring system will result in data quality assessment and data quality control. During data quality assessment, the quality of the monitored data is analyzed. If it is not sufficient, counter measures (i,e, source data validation) as part of data quality control are necessary.

Figure 1 [Fig. 1]

Lessons learned: Using our basic paradigm, a continuous data quality management incorporating proactive measures can be implemented. Critical findings can be mirrored back to practice and data quality can be improved iteratively. The concept should thus scale properly even for larger research infrastructures.

Conclusion: Most data-quality frameworks cannot easily be generalized. A consequence of adopting lessons from risk management is to focus on improving the process and data quality iteratively and proactively. Interventions need to be as much upstream as possible in the EHR generating medical process. However, continuous system re-engineering and the additional human resources are costly. It needs to be discussed, how much the gain in data quality is worth and what level of data quality is adequate. We intend to base our future data-quality management on the presented concept to address these issues.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J, Brookes AJ, Clark T, Crosas M, Dillo I, Dumon O, Edmunds S, Evelo CT, Finkers R, Gonzalez-Beltran A, Gray AJ, Groth P, Goble C, Grethe JS, Heringa J, 't Hoen PA, Hooft R, Kuhn T, Kok R, Kok J, Lusher SJ, Martone ME, Mons A, Packer AL, Persson B, Rocca-Serra P, Roos M, van Schaik R, Sansone SA, Schultes E, Sengstag T, Slater T, Strawn G, Swertz MA, Thompson M, van der Lei J, van Mulligen E, Velterop J, Waagmeester A, Wittenburg P, Wolstencroft K, Zhao J, Mons B. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016 Mar 15;3:160018. DOI: 10.1038/sdata.2016.18 External link
Safran C, Bloomrosen M, Hammond WE, Labkoff S, Markel-Fox S, Tang PC, Detmer DE, Expert Panel. Toward a national framework for the secondary use of health data: an American Medical Informatics Association White Paper. J Am Med Inform Assoc. 2007 Jan-Feb;14(1):1-9. DOI: 10.1197/jamia.M2273 External link
van der Veer SN, de Keizer NF, Ravelli AC, Tenkink S, Jager KJ. Improving quality of care. A systematic review on how medical registries provide information feedback to health care providers. Int J Med Inform. 2010 May;79(5):305-23. DOI: 10.1016/j.ijmedinf.2010.01.011 External link
Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013 Jan 1;20(1):144-51. DOI: 10.1136/amiajnl-2011-000681 External link
Bian J, Lyu T, Loiacono A, Viramontes TM, Lipori G, Guo Y, Wu Y, Prosperi M, George TJ, Harle CA, Shenkman EA, Hogan W. Assessing the practice of data quality evaluation in a national clinical data research network through a systematic scoping review in the era of real-world data. J Am Med Inform Assoc. 2020 Dec 9;27(12):1999-2010. DOI: 10.1093/jamia/ocaa245 External link
Kahn MG, Callahan TJ, Barnard J, Bauck AE, Brown J, Davidson BN, Estiri H, Goerg C, Holve E, Johnson SG, Liaw ST, Hamilton-Lopez M, Meeker D, Ong TC, Ryan P, Shang N, Weiskopf NG, Weng C, Zozus MN, Schilling L. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. EGEMS (Wash DC). 2016 Sep 11;4(1):1244. DOI: 10.13063/2327-9214.1244 External link
Borycki E, Dexheimer JW, Hullin Lucay Cossio C, Gong Y, Jensen S, Kaipio J, Kennebeck S, Kirkendall E, Kushniruk AW, Kuziemsky C, Marcilly R, Röhrig R, Saranto K, Senathirajah Y, Weber J, Takeda H. Methods for Addressing Technology-induced Errors: The Current State. Yearb Med Inform. 2016 Nov 10;(1):30-40. DOI: 10.15265/IY-2016-029 External link