gms | German Medical Science

65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS)

06.09. - 09.09.2020, Berlin (online conference)

Evaluating Automated Methods for Metadata Quality in Healthcare

Meeting Abstract

  • Alexandra Banach - Universität zu Lübeck, Lübeck, Germany
  • Ann-Kristin Kock-Schoppenhauer - Universität zu Lübeck, Lübeck, Germany
  • Hannes Ulrich - Universität zu Lübeck, Lübeck, Germany
  • Josef Ingenerf - Universität zu Lübeck, Lübeck, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS). Berlin, 06.-09.09.2020. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 221

doi: 10.3205/20gmds192, urn:nbn:de:0183-20gmds1922

Published: February 26, 2021

© 2021 Banach et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Background: High-quality medical data is essential for adequate patient care and meaningful results in clinical trials. Metadata on a schema-level are important to ensure quality on an instance data level, e.g. by defining suitable data types or by specifying value ranges [1]. Corresponding to the amount of instance data, the provision of describing metadata is increasing continuously. Therefore, an automatic evaluation of metadata quality is essential.

The validation of metadata is often based on quality metrics. However, existing metrics focus on instance data level, do not support metadata standards and are not automatically computable. Based on the results of Deppenwiese et al. [2], we checked whether existing quality criteria were adaptable to ISO 11179-3 conformant metadata [3] and added self-defined criteria. The computable metrics were implemented as a web application MetaCheck which is based on Java and the open-source metadata repository Samply.MDR [4].

Methods: We agreed on a set of quality criteria, which have proven to be useful for the quality statement of metadata: String elements should contain a regular expression for validation or numeric elements should have a unit of measure and a specified range to ensure upper and lower limits for values. Enumerations should contain between 3 and 10 values and for more than 10 entries catalogs should be used. Enumerated data elements with two values should be specified as a Boolean element. Since a Boolean element relies on a detailed description in order to comprehend the context, it was defined that the designation should contain at least ten words. Compared to the designation, the definition of a metadata element should be more detailed. Therefore, the definition should be at least twice as long and detailed as the designation. For the aggregation of metadata elements based on their semantic meaning, Samply.MDR provides data element groups as a feature and it was defined that groups should contain between two and ten elements. A larger amount of elements per group seems to be too complex and challenging to overlook. For all defined criteria, the proportion of elements that fulfill them and the complementary proportion, e.g. string elements without regular expressions, are measured.

To compare metadata sets to each other more effectively, a score was developed. It is computed by adding all metric values of the desired criteria and divide it by the sum of all computed metric values including the complementary criteria. The score is normalized between 0 and 1.

Results: The metrics were evaluated automatically on 12 metadata sets of a German biomedical research project resulting in an average score of 0.47. The highest achieved score was 0.614, the lowest was 0.250. The metrics showed that the main reasons for score reduction were string elements without regular expressions and too short elements definitions. That was confirmed by a manual check.

Conclusion: The results showed that MetaCheck can detect a lack of metadata quality on a technical level using the developed metrics. Nevertheless, semantic factors of metadata remain disregarded as they are difficult to be evaluated automatically.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Weiskopf NG, Weng C. Methods and dimensions of electronic health record data quality assessment: enabling reuse for clinical research. J Am Med Inform Assoc. 2013;20:144–151.
2.
Deppenwiese N, Kock-Schoppenhauer AK, Ulrich H, Duhm-Harbeck P, Ingenerf J. Automatic Evaluation of Metadata Quality in ISO 11179-3 Conformant Healthcare Metadata Repositories. In: Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie, Hrsg. 64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Dortmund, 08.-11.09.2019. Düsseldorf: German Medical Science GMS Publishing House; 2019. DocAbstr. 86. DOI: 10.3205/19gmds036 External link
3.
ISO/IEC. ISO/IEC 11179-3:2013 – Information technology -- Metadata registries (MDR) -- Part 3: Registry metamodel and basic attributes. 2013. Available from: https://www.iso.org/standard/50340.html External link
4.
Kadioglu D, Breil B, Knell C, Lablans M, Mate S, Schlue D, Serve H, Storf H, Ückert F, Wagner T, Weingardt P, Prokosch HU. Samply.MDR – A Metadata Repository and Its Application in Various Research Networks. Stud Health Technol Inform. 2018;253:50–54.