gms | German Medical Science

64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

08. - 11.09.2019, Dortmund

A data quality concept for observational studies

Meeting Abstract

  • Carsten Oliver Schmidt - Universität Greifswald, Greifswald, Germany
  • Cornelia Enzenbach - Universität Leipzig, Leipzig, Germany
  • Jürgen Stausberg - Institut für Medizinische Informatik, Biometrie und Epidemiologie, Universität Duisburg-Essen, Germany
  • Hermann Pohlabeln - Leibniz-Institut für Präventionsforschung und Epidemiologie – BIPS, Bremen, Germany
  • Adrian Richter - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Dortmund, 08.-11.09.2019. Düsseldorf: German Medical Science GMS Publishing House; 2019. DocAbstr. 241

doi: 10.3205/19gmds011, urn:nbn:de:0183-19gmds0112

Published: September 6, 2019

© 2019 Schmidt et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: A high data quality is necessary to ensure valid results from health studies. This entails the need for informative and consented data quality indicators.

Background: Across and within scientific disciplines many approaches to address data quality exist [1], [2]. This is partially due to a large heterogeneity of data structures and data collection processes [3]. Even though primary data collections in observational health research share many methodological characteristics, the diversity of implemented date quality assessments is substantial. We present results of a DFG funded project to elaborate standards and tools for data quality assessments for these studies.

Concept: Our data quality concept focusses on intrinsic data quality, i.e. quality which can be assessed without contextual information (e.g. the specific research question) [2]. Within this focus, four data quality dimensions are distinguished. One dimension is related to the technical usability of data for quality assessment purposes (data integrity), another dimension covers the degree to which data values are present in a data collection (data completeness), and two dimensions are related to data correctness. First, this is data consistency, the degree to which data values are free of contradictions or convention breaks, and second, data accuracy, the agreement between study data and expected values or distributions. Within these dimensions currently 35 indicators are distinguished.

Implementation: The implementation of the data quality concept links the theoretical concept with generic statistical implementations including documentations. For this purpose, a public website is under construction using R Markdown. It provides an overview on the data quality concept and associated statistical routines, which are based on the programming language R. Furthermore, there is an outline of necessary metadata to compute data quality indicators. Scientists and database managers need to take care for adequately setting up such information, preferably within an augmented centralized data dictionary.

Conclusion: The introduced data quality concept provides an accessible and theoretically founded basis to enable harmonized data quality assessments for primary data collections in observational health studies. Limits concern the applicability to data with an entirely different data provenance, e.g. administrative data collections.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Stausberg J, Nasseh D, Nonnemacher M. Measuring data quality: a review of the literature between 2005 and 2013. Studies in health technology and informatics. 2015;210:712-6.
2.
Wang RY, Strong DM. Beyond accuracy: What data quality means to data consumers. Journal of management information systems. 1996 Mar 1;12(4):5-33.
3.
Keller S, Korkmaz G, Orr M, Schroeder A, Shipp S. The evolution of data quality: Understanding the transdisciplinary origins of data quality concepts and approaches. Annual Review of Statistics and Its Application. 2017 Mar 7;4:85-108.