GMS | Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH) | Comparing the attributes of health research data models and standards for conducting data quality assessments

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH)

08.09. - 13.09.2024, Dresden

Article

XML version

Send article

Comparing the attributes of health research data models and standards for conducting data quality assessments

Meeting Abstract

Search Medline for

Joany Mariño - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany
Adrian Richter - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany
Elena Salogni - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany
Clair Blacketer - Janssen Research & Development, LLC, Titusville, United States
Caroline Bönisch - Hochschule Stralsund, Stralsund, Germany
Christian Draeger - Universität Leipzig, Leipzig, Germany
Elisa Kasbohm - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany
Sophie Klopfenstein - Berlin Institute of Health at Charité, Berlin, Germany
Matthias Löbe - Universität Leipzig, Leipzig, Germany
Günther Rezniczek - Marien Hospital Herne, Klinikum der Ruhr-Universität Bochum, Herne, Germany
Ulrich Sax - Institut für Medizinische Informatik, Universitätsmedizin GöttingenCampus-Institut für Data Science (CIDAS), Georg-August-Universität Göttingen, Göttingen, Germany
Stephan Struckmann - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany
Carina Nina Vorisek - Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
Carsten Oliver Schmidt - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH). Dresden, 08.-13.09.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 609

doi: 10.3205/24gmds126, urn:nbn:de:0183-24gmds1261

Published:	September 6, 2024

© 2024 Mariño et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.

Outline

Text

Introduction: Many health standards and tools are tailored to capture, describe, and manage data. Across these cases, evaluating data quality (DQ) requires comprehensive metadata describing the expectations and requirements on the data. Creating such metadata is complex and time-consuming; hence, knowing the potential of popular standards and tools for hosting metadata is important. Here, we gauge the ability of their underlying data models (DM) to provide metadata attributes and evaluate their potential for data quality assessments (DQA) according to a formal DQ framework (DQ-OBS) [1].

Methods: We focused on widely used standards and tools in healthcare and research: CDISC Define-XML, HL7^® FHIR^®, OHDSI OMOP Common Data Model, openEHR, OpenClinica^®, and REDCap^®. These were selected by experts from the National Research Data Infrastructure for Personal Health Data (NFDI4Health) and the German Medical Informatics Initiative (MII) because of their relevance in health sciences. As a reference, we chose DQ-OBS due to its development based on consented DQ indicators, organized over Integrity, Completeness, Consistency, and Accuracy dimensions. We collaborated with clinicians and medical informaticians with expertise in the tools and standards, interoperability, eHealth, and DQ. We held two meetings with all authors, followed by several individual meetings and written exchanges to discuss DM attributes and their mapping to DQ-OBS. Attributes preventing DQ issues during data entry or capture (e.g., in electronic health records or case report forms, without differentiating between them) were included if they could be stored and reused. The match between each DQ-OBS indicator and the DM attributes was systematically checked for plausibility using the corresponding documentation.

Results: Among the DQ dimensions, attributes to compute Completeness and Consistency indicators are well represented across DMs, Integrity is mostly addressed, and Accuracy is poorly targeted. None of the standards or tools include information to handle mismatches between data sets. Few indicators are trivial to compute because metadata is unnecessary (e.g., crude missingness can be calculated if missing codes are defined in the data). OpenClinica, REDCap, and FHIR prevent DQ issues during data entry using definitions entered before data collection. The only dedicated software for DQA are OHDSI’s DataQualityDashboard and REDCap’s DQ module. All standards allow defining custom rules; however, these require technical knowledge and are standard-specific, as the DMs are not interoperable. Nevertheless, related but independent open-source software facilitates entering and validating rules (e.g., software from CDSIC COSA for Define-XML, ocRuleTool for OpenClinica). Similarly, other software allows extracting and transforming data from the respective DMs into R or Python, which is crucial as the data can then be reused for DQA (e.g., fhircrackr, pyEHR, openehR, or REDCapR).

Conclusions: All assessed DMs offer possibilities for DQA. However, their main focus is formal checks on data completeness and consistency, while information relevant for evaluating data integrity and accuracy requires extension. Developing DMs and tools that include this additional metadata and the DQ results would greatly enhance research transparency. Once such metadata is in place, assessments could be done efficiently and uniformly using generic DQ tools [2]. These developments would improve the potential for harmonized DQA and reproducible research.

Clair Blacketer is an employee of Janssen Research & Development, LLC and holds stock and stock options.

The authors declare that an ethics committee vote is not required.

Outline

References

1.: Schmidt CO, Struckmann S, Enzenbach C, Reineke A, Stausberg J, Damerow S, et al. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC medical research methodology. 2021 Dec;21:1-5. DOI: 10.1186/s12874-021-01252-7
2.: Mariño J, Kasbohm E, Struckmann S, Kapsner LA, Schmidt CO. R packages for data quality assessments and data monitoring: a software scoping review with recommendations for future developments. Applied Sciences. 2022 Apr 22;12(9):4238. DOI: 10.3390/app12094238

gms | German Medical Science

Article

Comparing the attributes of health research data models and standards for conducting data quality assessments

Search Medline for

Authors

Outline

Text

References