gms | German Medical Science

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH)

08.09. - 13.09.2024, Dresden

Comparing the attributes of health research data models and standards for conducting data quality assessments

Meeting Abstract

  • Joany Mariño - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany
  • Adrian Richter - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany
  • Elena Salogni - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany
  • Clair Blacketer - Janssen Research & Development, LLC, Titusville, United States
  • Caroline Bönisch - Hochschule Stralsund, Stralsund, Germany
  • Christian Draeger - Universität Leipzig, Leipzig, Germany
  • Elisa Kasbohm - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany
  • Sophie Klopfenstein - Berlin Institute of Health at Charité, Berlin, Germany
  • Matthias Löbe - Universität Leipzig, Leipzig, Germany
  • Günther Rezniczek - Marien Hospital Herne, Klinikum der Ruhr-Universität Bochum, Herne, Germany
  • Ulrich Sax - Institut für Medizinische Informatik, Universitätsmedizin GöttingenCampus-Institut für Data Science (CIDAS), Georg-August-Universität Göttingen, Göttingen, Germany
  • Stephan Struckmann - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany
  • Carina Nina Vorisek - Berlin Institute of Health at Charité - Universitätsmedizin Berlin, Berlin, Germany
  • Carsten Oliver Schmidt - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH). Dresden, 08.-13.09.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 609

doi: 10.3205/24gmds126, urn:nbn:de:0183-24gmds1261

Veröffentlicht: 6. September 2024

© 2024 Mariño et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: Many health standards and tools are tailored to capture, describe, and manage data. Across these cases, evaluating data quality (DQ) requires comprehensive metadata describing the expectations and requirements on the data. Creating such metadata is complex and time-consuming; hence, knowing the potential of popular standards and tools for hosting metadata is important. Here, we gauge the ability of their underlying data models (DM) to provide metadata attributes and evaluate their potential for data quality assessments (DQA) according to a formal DQ framework (DQ-OBS) [1].

Methods: We focused on widely used standards and tools in healthcare and research: CDISC Define-XML, HL7® FHIR®, OHDSI OMOP Common Data Model, openEHR, OpenClinica®, and REDCap®. These were selected by experts from the National Research Data Infrastructure for Personal Health Data (NFDI4Health) and the German Medical Informatics Initiative (MII) because of their relevance in health sciences. As a reference, we chose DQ-OBS due to its development based on consented DQ indicators, organized over Integrity, Completeness, Consistency, and Accuracy dimensions. We collaborated with clinicians and medical informaticians with expertise in the tools and standards, interoperability, eHealth, and DQ. We held two meetings with all authors, followed by several individual meetings and written exchanges to discuss DM attributes and their mapping to DQ-OBS. Attributes preventing DQ issues during data entry or capture (e.g., in electronic health records or case report forms, without differentiating between them) were included if they could be stored and reused. The match between each DQ-OBS indicator and the DM attributes was systematically checked for plausibility using the corresponding documentation.

Results: Among the DQ dimensions, attributes to compute Completeness and Consistency indicators are well represented across DMs, Integrity is mostly addressed, and Accuracy is poorly targeted. None of the standards or tools include information to handle mismatches between data sets. Few indicators are trivial to compute because metadata is unnecessary (e.g., crude missingness can be calculated if missing codes are defined in the data). OpenClinica, REDCap, and FHIR prevent DQ issues during data entry using definitions entered before data collection. The only dedicated software for DQA are OHDSI’s DataQualityDashboard and REDCap’s DQ module. All standards allow defining custom rules; however, these require technical knowledge and are standard-specific, as the DMs are not interoperable. Nevertheless, related but independent open-source software facilitates entering and validating rules (e.g., software from CDSIC COSA for Define-XML, ocRuleTool for OpenClinica). Similarly, other software allows extracting and transforming data from the respective DMs into R or Python, which is crucial as the data can then be reused for DQA (e.g., fhircrackr, pyEHR, openehR, or REDCapR).

Conclusions: All assessed DMs offer possibilities for DQA. However, their main focus is formal checks on data completeness and consistency, while information relevant for evaluating data integrity and accuracy requires extension. Developing DMs and tools that include this additional metadata and the DQ results would greatly enhance research transparency. Once such metadata is in place, assessments could be done efficiently and uniformly using generic DQ tools [2]. These developments would improve the potential for harmonized DQA and reproducible research.

Clair Blacketer is an employee of Janssen Research & Development, LLC and holds stock and stock options.

The authors declare that an ethics committee vote is not required.


References

1.
Schmidt CO, Struckmann S, Enzenbach C, Reineke A, Stausberg J, Damerow S, et al. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC medical research methodology. 2021 Dec;21:1-5. DOI: 10.1186/s12874-021-01252-7 Externer Link
2.
Mariño J, Kasbohm E, Struckmann S, Kapsner LA, Schmidt CO. R packages for data quality assessments and data monitoring: a software scoping review with recommendations for future developments. Applied Sciences. 2022 Apr 22;12(9):4238. DOI: 10.3390/app12094238 Externer Link