Artikel
From idea to implementation – the conduct of FAIR data quality assessments with R
Suche in Medline nach
Autoren
Veröffentlicht: | 6. September 2024 |
---|
Gliederung
Text
Workshop organizers: Stephan Struckmann, Elena Salogni, Carsten Oliver Schmidt
Invited by: NFDI4Health
Conducting comprehensive data quality assessments can be a complex task due to the many potential checks even for small-sized data sets. Therefore, this workshop aims to guide the participants through the efficient and reproducible planning, implementation and interpretation of data quality assessments.
The conceptual basis is a data quality framework for observational studies [1]. A recent update of the R package dataquieR will be used to perform the assessments [2]. To ensure an applicable real-world example, anonymous data from the Study of Health in Pomerania (SHIP) will be assessed. The main data quality aspects to be targeted are completeness (the degree to which expected data values are present), consistency (the degree to which data values are free from convention breaks or contradictions), and accuracy (the degree of agreement between observed and expected distributions and associations).
The workshop will guide through three aspects of assessing data quality: a.) introduction to the data quality concept and metadata setup, b.) hands-on tutorial and creation of data quality reports, and c.) interpretation of data quality results and discussion.
First, key aspects of the target framework and the metadata model underlying dataquieR will be introduced. Having metadata available in a machine-readable form may be a key aspect of making data quality assessments FAIRer. Using basic metadata, the participants will generate an initial data quality report. Next, the participants will learn hands-on how to expand the metadata and generate more extensive and focused reports. Finally, we will discuss the interpretation of the results and potential barriers to implementing data quality assessments in participants' studies.
Online documentation and sample data will be freely available to the participants. This enables the participants to perform all steps of data quality assessments individually. Additional feedback options will be available through online tools.
A beginner’s level of R is sufficient to participate in the workshop, as almost no programming skills are required even for complex reports. Participants should bring a laptop having a recent web browser installed with access to the internet. Alternatively, a laptop with at least 16 GBytes of RAM, the latest version of R, R-Studio, the dataquieR package and all its suggested dependencies.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Schmidt CO, Struckmann S, Enzenbach C, Reineke A, Stausberg J, Damerow S, et al. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC Med Res Methodol. 2021;21(1):63.
- 2.
- Struckmann S, Mariño J, Kasbohm E, Salogni E, Schmidt CO. dataquieR 2: An updated R package for FAIR data quality assessments in observational studies and electronic health record data [Preprint]. 2024. DOI: 10.5281/zenodo.10722214