gms | German Medical Science

63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

02. - 06.09.2018, Osnabrück

Which data and data structures are required to implement automated data quality monitoring in observational studies? Experiences from a population based cohort study

Meeting Abstract

  • Adrian Richter - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Deutschland
  • Birgit Schauer - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Deutschland
  • Kristin Henselin - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Deutschland
  • Martin Junge - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Deutschland
  • Stephan Struckmann - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Deutschland
  • Elizabeth Sierocinski - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Deutschland
  • Jörg Henke - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Deutschland
  • Carsten Oliver Schmidt - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Osnabrück, 02.-06.09.2018. Düsseldorf: German Medical Science GMS Publishing House; 2018. DocAbstr. 252

doi: 10.3205/18gmds016, urn:nbn:de:0183-18gmds0162

Published: August 27, 2018

© 2018 Richter et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: Accuracy, completeness and consistency of health data are crucial prerequisites to allow for valid scientific conclusions in health research. Yet, approaches to ensure data quality appear insufficiently implemented since publications of uncertain research results are frequent [1].

Background: One prerequisite for adequate assessment of data quality is metadata for all measurements. REDCap [2] provides a sophisticated approach in metadata-driven quality control. Nonetheless, more information appears necessary to evaluate data quality [1]: knowledge of the data generating process, i.e. ambient conditions, methods, and materials. Clear guidance on which metadata and other information to use is lacking [3]. Based on our experience in SHIP, we’ll provide an overview of one approach to consolidate different information into data quality assessments and provide results of the implementation.

Concept: Our concept distinguishes metadata and auxiliary variables. Metadata comprise static attributes with different purposes of application. Some attributes are related to the data quality domains: (i) completeness, e.g. missing codes or jump codes which indicate conditional missing data, (ii) consistency, e.g. admissible values of categorical variables, and (iii) accuracy, e.g. precision of measurement variables. Another group of attributes is related to the appropriate selection of data quality statistics, e.g. via data types or distributional classes. A third group of attributes is essential for an appropriate reporting, which includes labels, measurement units, and also graphical information such as colors assigned to examiners.

In contrast to metadata attributes, information that may vary across realizations of a single study variable is captured in auxiliary variables. We distinguish two broad groups: (a) variables related to the study design and implementation (e.g. devices, examiners, locations) and (b) variables related to the measurement of environmental conditions (e.g. time stamps, temperature, or humidity). In contrast to metadata attributes, auxiliary variables form part of the study data.

Implementation: We extended the data dictionary of the Study of Health in Pomerania on metadata attributes relevant for quality assessments. The study data of clinical measurements were augmented by auxiliary variables. All data are stored in a PostgreSQL data base. In continuous data quality assessments the web application SQUARE2 [4] is used which has an R-Server in the back-end to compute different quality aspects. Herein, auxiliary variables are often examined as potential sources of variation in the measurements.

Lessons Learned: The computation of most aspects of data quality requires the existence of metadata and auxiliary variables. In SHIP, the complete scope of metadata and auxiliary variables was not entirely clear at study start and had to be implemented thereafter. Completing these data comprised a considerable amount of work but enabled for more efficient implementation of data monitoring processes. Particularly auxiliary variables can be captured automatically using system information during the data entry. Using these options to automatize data entry of auxiliary variables eases the work load of examiners.

Conclusion: Metadata are essential but not sufficient to examine data quality. Whereas metadata enable the detection and description of abnormalities in the data such as implausible values, auxiliary variables provide the key information to identify sources of error.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Prinz F, Schlange T, Asadullah K. Believe it or not: how much can we rely on published data on potential drug targets? Nature reviews Drug discovery. 2011;10(9):712.
2.
Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. Journal of Biomedical Informatics. 2009;42(2):377-81.
3.
Chen H, Hailey D, Wang N, Yu P. A review of data quality assessment methods for public health information systems. International journal of environmental research and public health. 2014;11(5):5170-207.
4.
Schmidt C, Krabbe C, Schössow J, Albers M, Radke D, Henke J. Square2-A Web Application for Data Monitoring in Epidemiological and Clinical Studies. Studies in health technology and informatics. 2017;235:549-53.