gms | German Medical Science

68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

17.09. - 21.09.23, Heilbronn

Data Quality Assessment in the seventh MII-Projectathon

Meeting Abstract

Suche in Medline nach

  • Christian Draeger - Institute for Medical Informatics, Statistics and Epidemiology (IMISE), University of Leipzig, Leipzig, Germany
  • Matthias Löbe - Universität Leipzig, Leipzig, Germany
  • Frank A. Meineke - Universität Leipzig, Leipzig, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS). Heilbronn, 17.-21.09.2023. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocAbstr. 316

doi: 10.3205/23gmds139, urn:nbn:de:0183-23gmds1397

Veröffentlicht: 15. September 2023

© 2023 Draeger et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: The goal of the Medical Informatics Initiative (MII) is to make healthcare data accessible to research. To this end, data is collected and made available at the Data Integration Centers (DIC). To evaluate the progress of this effort, regular Projectathons are conducted. For such secondary uses of healthcare data, consideration of data quality (DQ) is particularly important. Data that is of good quality for healthcare is not automatically suitable for every research question in terms of quality and granularity. Thus, data quality must be re-evaluated in each case with respect to its specific research question.

Methods: We use the DQ terminology by Kahn et al. ([1], p. 18) for all collaborative efforts in the MII. It divides DQ into three levels: Conformance, Completeness and Plausibility, and two contexts: Verification and Validation. Each level includes further sub-levels as well as concrete definitions.

The MII has defined its own FHIR profiles, the Core Data Set (CDS) for data representation. Running FHIR validation against those profiles in the different Data Integration Centres (DIC) fulfils Kahn Conformance Verification and Validation. Starting from the data requirements of the Projectathon use case (Catalogue of Items), we implemented Data Quality Indicators (DQIs) using dataquieR [2], satisfying Kahn Completeness and Plausibility Verification. As the scripts for data extraction and analysis at the DICs were written in R, attaching the DQ scripts was straightforward.

Additionally, we deposited the DQIs in CEDAR, a metadata repository that allows medical codes to be included (via BioPortal) and the core dataset elements to be referenced via FHIRPath. By linking the core dataset, the medical codes and the DQIs, we enable the reuse of the DQIs beyond a single use case. Subsequent use cases, based on the same codes or CDS elements, can build on these DQIs. Therefore, we refer to the DQIs now available through CEDARs REST API as FAIR DQIs.

Results: We defined a formal representation of DQIs specific to the distributed context of the MII [3]. We also created DQIs for all Kahn definitions based on the elements of the CDS [4]. We created a framework for generating DQ reports when extracting data from the DICs. We implemented the DQIs in R using dataquieR as part of the seventh Projectathon. The developed scripts are available on the MII GitHub page [5] and we additionally provide FAIR DQIs through CEDAR. These allow the reuse of our DQ efforts, as well as the comparison of our results across specific implementations. Figure 1 [Fig. 1] shows the framework for a clinical, decentral use case of the 7th MII Projectathon dealing with “Atrial Fibrillation”. A similar, central use case had previously been addressed in the 6th Projectathon [6]. The Preparation Phase allows for re-use of existing FAIR DQIs during script implementation, if another use case provided them during its Feedback Phase. In the Operational Phase scripts are run at each DIC independently. We were able to re-use the FAIR DQIs from the 6th Projectathon, demonstrating the value of the outlined framework.

Discussion & conclusion: Data requirements not only change with secondary use but can also vary between use cases. For DQ analysis, it is therefore difficult to make general statements about a dataset. Instead, the suitability of a dataset for answering a research question should be assessed by the use case experts. It was necessary to map DQIs as more than just tool-specific metadata. The creation of FAIR DQIs prior to implementation enables the sharing and reuse of DQIs, especially in highly distributed contexts, like the MII, where the same tools are not used in all locations. While the Projectathons were limited in terms of the suitable DQ tools, FAIR DQIs with their tool independent conceptualisation provide an opportunity for reuse in future use cases requiring different tooling. This is especially relevant for the path forward in the MII, as different tools for data quality assessment (DQA) are already being developed in the MIRACUM consortium [7] while use case specific DQA implementations are seeing adaption at the same time [8]. This approach also has the potential to benefit from other already established DQA tools specifically designed in the context of secondary use, like the tools developed as part of OHDSI, which are already seeing some support in Germany [9].

The Kahn framework provided a starting point for possible DQIs, but Completeness and Plausibility Validation remained unresolved. The creation of the necessary reference distributions, so-called “gold standards”, would particularly benefit from the wealth of data in the MII. We hope to address this in a dedicated DQ-focused use case.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.

This contribution has already been published: The outlined Framework was already practically applied in the 6th Projectathon of the MII. We previously published a (German) abstract about the 6th Projectathon [6].This abstract builds on our previous work, and we want to use this opportunity to update the community of the ongoing efforts in the MII Projectathons and present some new results as well as the updated framework from the 7th Projectathon.


References

1.
Kahn MG, Callahan TJ, Barnard J, Bauck AE, Brown J, Davidson BN, Estiri H, Goerg C, Holve E, Johnson SG, Liaw ST, Hamilton-Lopez M, Meeker D, Ong TC, Ryan P, Shang N, Weiskopf NG, Weng C, Zozus MN, Schilling L. A Harmonized Data Quality Assessment Terminology and Framework for the Secondary Use of Electronic Health Record Data. EGEMS (Wash DC). 2016 Sep 11;4(1):1244. DOI: 10.13063/2327-9214.1244 Externer Link
2.
Schmidt CO, Struckmann S, Enzenbach C, Reineke A, Stausberg J, Damerow S, Huebner M, Schmidt B, Sauerbrei W, Richter A. Facilitating harmonized data quality assessments. A data quality framework for observational health research data collections with software implementations in R. BMC Med Res Methodol. 2021 Apr 2;21(1):63. DOI: 10.1186/s12874-021-01252-7 Externer Link
3.
Tute E, Draeger C, Gierend K, Löbe M, Palm J, Schmidt CO. A glimpse at representing data quality rules for their collaborative governance in the Medical Informatics Initiative. In: 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF); 2022 Aug 21-25; online. Düsseldorf: GMS; 2022. DocAbstr. 166. DOI: 10.3205/22GMDS018 Externer Link
4.
Draeger C, Tute E, Schmidt CO, Waltemath D, Boeker M, Winter A, et al. Identifying Relevant FHIR Elements for Data Quality Assessment in the German Core Data Set. Studies in Health Technology and Informatics. IOS Press; 2023. DOI: 10.3233/SHTI230117 Externer Link
5.
Medical Informatics Initiative. Projectathon7-VHF. Github repository for the scripts for the seventh Projectathon [Internet]. GitHub; [cited 2023 June 07]. Available from: https://github.com/medizininformatik-initiative/Projectathon7-VHF Externer Link
6.
Draeger C, Löbe M. Datenqualitätsanalysen im Rahmen der MII-Projectathons. In: SMITH Science Day 2022. Aachen, 23.11.2022. Düsseldorf: GMS; 2023. DocP25. DOI: 10.3205/22smith36 Externer Link
7.
Kapsner LA, Mang JM, Mate S, Seuchter SA, Vengadeswaran A, Bathelt F, et al. Linking a consortium-wide data quality assessment tool with the MIRACUM Metadata Repository. Applied Clinical Informatics. 2021;12(04):826–35. DOI: 10.1055/s-0041-1733847 Externer Link
8.
Tahar K, Martin T, Mou Y, Verbuecheln R, Graessner H, Krefting D. Rare diseases in hospital information systems — an interoperable methodology for distributed data quality assessments. Methods of Information in Medicine. 2023 Sep;62(3-04):71-89. DOI: 10.1055/a-2006-1018 Externer Link
9.
Peng Y, Henke E, Reinecke I, Zoch M, Sedlmayr M, Bathelt F. An ETL-process design for data harmonization to participate in international research with German real-world data based on FHIR and OMOP CDM. International Journal of Medical Informatics. 2023;169:104925. DOI: 10.1016/j.ijmedinf.2022.104925 Externer Link