gms | German Medical Science

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH)

08.09. - 13.09.2024, Dresden

From idea to implementation – the conduct of FAIR data quality assessments with R

Meeting Abstract

Suche in Medline nach

  • Stephan Struckmann - Universitätsmedizin Greifswald, Greifswald, Germany
  • Elena Salogni - Universitätsmedizin Greifswald, Greifswald, Germany
  • Carsten Oliver Schmidt - Universitätsmedizin Greifswald, Greifswald, Germany

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH). Dresden, 08.-13.09.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 346

doi: 10.3205/24gmds931, urn:nbn:de:0183-24gmds9313

Veröffentlicht: 6. September 2024

© 2024 Struckmann et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Workshop organizers: Stephan Struckmann, Elena Salogni, Carsten Oliver Schmidt

Invited by: NFDI4Health

Conducting comprehensive data quality assessments can be a complex task due to the many potential checks even for small-sized data sets. Therefore, this workshop aims to guide the participants through the efficient and reproducible planning, implementation and interpretation of data quality assessments.

The conceptual basis is a data quality framework for observational studies [1]. A recent update of the R package dataquieR will be used to perform the assessments [2]. To ensure an applicable real-world example, anonymous data from the Study of Health in Pomerania (SHIP) will be assessed. The main data quality aspects to be targeted are completeness (the degree to which expected data values are present), consistency (the degree to which data values are free from convention breaks or contradictions), and accuracy (the degree of agreement between observed and expected distributions and associations).

The workshop will guide through three aspects of assessing data quality: a.) introduction to the data quality concept and metadata setup, b.) hands-on tutorial and creation of data quality reports, and c.) interpretation of data quality results and discussion.

First, key aspects of the target framework and the metadata model underlying dataquieR will be introduced. Having metadata available in a machine-readable form may be a key aspect of making data quality assessments FAIRer. Using basic metadata, the participants will generate an initial data quality report. Next, the participants will learn hands-on how to expand the metadata and generate more extensive and focused reports. Finally, we will discuss the interpretation of the results and potential barriers to implementing data quality assessments in participants' studies.

Online documentation and sample data will be freely available to the participants. This enables the participants to perform all steps of data quality assessments individually. Additional feedback options will be available through online tools.

A beginner’s level of R is sufficient to participate in the workshop, as almost no programming skills are required even for complex reports. Participants should bring a laptop having a recent web browser installed with access to the internet. Alternatively, a laptop with at least 16 GBytes of RAM, the latest version of R, R-Studio, the dataquieR package and all its suggested dependencies.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Schmidt CO, Struckmann S, Enzenbach C, Reineke A, Stausberg J, Damerow S, et al. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC Med Res Methodol. 2021;21(1):63.
2.
Struckmann S, Mariño J, Kasbohm E, Salogni E, Schmidt CO. dataquieR 2: An updated R package for FAIR data quality assessments in observational studies and electronic health record data [Preprint]. 2024. DOI: 10.5281/zenodo.10722214 Externer Link