gms | German Medical Science

66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

26. - 30.09.2021, online

Using the web-based metadata repository OPAL and the R package dataquieR for quality assessments of the SHIP-COVID study

Meeting Abstract

  • Adrian Richter - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany
  • Stephan Struckmann - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany
  • Jörg Henke - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany
  • Carsten Oliver Schmidt - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 26.-30.09.2021. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 147

doi: 10.3205/21gmds028, urn:nbn:de:0183-21gmds0287

Published: September 24, 2021

© 2021 Richter et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: In the current COVID-19 pandemic, many epidemiological studies have been implemented in the short-term, such as SHIP-COVID within the population-based Study of Health in Pomerania (SHIP). Metadata of SHIP-COVID survey instruments comply with FAIR criteria and are publicly available. Soon after the start of SHIP-COVID, data quality assessments of the collected data were required. This work illustrates the application of a data quality assessment tool, its possibilities and limitations with reference to a recently proposed data quality framework [1], when using metadata from a central repository, in this case an OPAL database [2]. OPAL is of interest, as many networks such as Maelstrom, euCanSHare, or NFDI4Health use it to enable cohort browsing.

Methods: Metadata from SHIP-COVID were stored in an OPAL database; the SHIP data management provided the SHIP-COVID study data. The R package dataquieR [3] was used to conduct data quality assessments of the SHIP-COVID study data. Therefore, an R script has been prepared to convert the OPAL representation of metadata to requirements defined by dataquieR. Applicability of dataquieR functions, related to the data quality dimensions integrity, completeness, consistency, and accuracy were examined.

Results: The metadata obtained from the OPAL repository were downloaded as a spreadsheet containing the variable names with respective data types, and, in a separate sheet, the categories of discrete variables if applicable. The modification of the OPAL representation of metadata comprised steps to remove empty rows, rename data elements, and to assign value labels for categorical data. OPAL metadata and the R script are publicly available (https://gitlab.com/libreumg/opalxls2dataquier). Data of the ongoing SHIP-COVID study are not yet publicly available.

Data quality indicators belonging to the dimensions integrity and completeness were almost completely computable. Regarding consistency, only inadmissible categorical values could be assessed. Assessment of inadmissible numerical values was not possible, since the OPAL repository has no standard to define range specifications. Further, contradictions between data elements cannot be specified in OPAL yet. Indicators of the dimension accuracy were not applicable due to missing process-related metadata.

Discussion: The efforts required to conduct basic data quality assessments with metadata provided by the current OPAL data model are low. Crucial data quality assessments related to integrity and data completeness were applicable. To address consistency and accuracy the OPAL data model of metadata requires extensions.

Conclusion: Metadata from the OPAL repository can be used to assess data quality with a tool like dataquieR with low efforts. Yet, the OPAL data model needs to be expanded to enable for comprehensive data quality assessments. For example, RedCAP [4] includes further attributes in the metadata such as “validation min/max” and CDISC-Define-XML [5] uses “significant digits” (decimal number); both being relevant for either consistency or accuracy related analyses. Extending OPAL for further metadata attributes would add important options to use centrally stored metadata beyond cohort browsing options and to share metadata between studies with similar instruments.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Schmidt CO, Struckmann S, Enzenbach C, Reineke A, Stausberg J, Damerow S, et al. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC Med Res Methodol. 2021;21(1):63. DOI: 10.1186/s12874-021-01252-7 External link
2.
Doiron D, Marcon Y, Fortier I, Burton P, Ferretti V. Software Application Profile: Opal and Mica: open-source software solutions for epidemiological data management, harmonization and dissemination. Int J Epidemiol. 2017;46(5):1372-8. DOI: 10.1093/ije/dyx180 External link
3.
Richter A, Schmidt CO, Krüger M, Struckmann S. dataquieR: assessment of data quality in epidemiological research. Journal of Open Source Software. 2021;6(61):3093. DOI: 10.21105/joss.03093 External link
4.
Beasley W. REDCapR: Interaction Between R and REDCap. 2020. Available from: https://CRAN.R-project.org/package=REDCapR External link
5.
Jansen L. Accessing the Metadata from Define-XML. Seattle: PharmaSUG; 2018. Available from: https://www.pharmasug.org/proceedings/2018/SS/PharmaSUG-2018-SS11.pdf External link