gms | German Medical Science

64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

08. - 11.09.2019, Dortmund

Automatic Evaluation of Metadata Quality in ISO 11179-3 Conformant Healthcare Metadata Repositories

Meeting Abstract

  • Noemi Deppenwiese - Friedrich-Alexander-Universität Erlangen-Nürnberg, Erlangen, Germany; Universität zu Lübeck, Lübeck, Germany
  • Ann-Kristin Kock-Schoppenhauer - Universität zu Lübeck, Lübeck, Germany
  • Hannes Ulrich - Universität zu Lübeck, Lübeck, Germany
  • Petra Duhm-Harbeck - Universität zu Lübeck, Lübeck, Germany
  • Josef Ingenerf - Universität zu Lübeck, Lübeck, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Dortmund, 08.-11.09.2019. Düsseldorf: German Medical Science GMS Publishing House; 2019. DocAbstr. 86

doi: 10.3205/19gmds036, urn:nbn:de:0183-19gmds0362

Veröffentlicht: 6. September 2019

© 2019 Deppenwiese et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

The expected benefit of well-maintained metadata repositories lies in facilitating data integration and improving quality, as required data elements do not have to be re-specified, but existing high-quality predefined elements can be reused [1]. To yield the expected benefit, the included metadata must have sufficient quality. The manual verification of metadata elements is time-consuming and biased. Our aim is to apply quality criteria to metadata in healthcare and medical research settings and to develop computable validations. In order to be easily adaptable to new metadata repository systems, the quality criteria should support the standardized ISO 11179-3 metamodel [2], [3] which can be mapped to other metadata standards used in clinical research [1].

We carried out a literature review and applied the identified criteria to the metadata in the medical context, checking whether the metrics could be calculated automatically. Metrics developed by Oucha and Duval were computable, but not evaluated for a healthcare and medical research context [4]. We analyzed these metrics for their applicability in medical contexts and found that most of them need to be heavily adapted in order to work with healthcare metadata. For example, their approach works with free text fields, of which the ISO 11179-3 standard only contains two. McMahon and Denax proposed a framework for assessing metadata quality in epidemiological and public health research settings, but it was developed to guide human experts in reviewing the metadata and does not contain computable metrics [5]. Additionally, new, domain-specific criteria can be derived from clinical data quality criteria [6]. These criteria test the metadata’s ability to ensure data quality, e.g., by counting the presence of validation information. Other, more understandability-focused criteria, can be created by using specifics of the ISO 11179-3 standard, e.g., the number of distinctive values used as keys in slot annotations.

While there have been attempts establishing metadata quality criteria, none of them fulfill all our requirements. That is why we selected and adapted some criteria from the sources mentioned above: completeness, conformance to expectations, consistency, findability, timeliness and provenance. The newly developed criteria include the number of slots and distinctive keys in a repository, the proportion of numerical elements without units of measure, the length of definition values and the share of string elements as well as the portion of them that does not provide validation rules. The quality criteria can be evaluated automatically so that human reviewers can focus on potential problems and other, non-calculable criteria (e.g. the accuracy of definitions).

We performed a first evaluation of the developed metrics and could identify potential quality problems in a set of research metadata. More advanced methods of analysis, e.g., semantic analysis of free-text fields, are conceivable but would require semantic annotations and ontology support [7], which are not provided by current ISO 11179-3 conformant implementations like the Samply.MDR [8]. Furthermore, metadata repositories that support user authentication and provenance information would be able to calculate reputation scores for each user based on the quality of their submitted metadata and encourage them to improve their contributions.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Ngouongo SM, Löbe M, Stausberg J. The ISO/IEC 11179 norm for metadata registries: Does it cover healthcare standards in empirical research? Journal of biomedical informatics. 2013 Apr 1;46(2):318-27.
2.
ISO/IEC, ISO/IEC 11179-3:2013 - Information technology - Metadata registries (MDR) - Part 3: Registry metamodel and basic attributes. 2013. [Accessed 17 July 2019]. Available from: https://www.iso.org/standard/50340.html Externer Link
3.
Meta Object Facility Specification Version 2.5.1. 2016. [Accessed 17 July 2019]. Available from: https://www.omg.org/spec/MOF/2.5.1/ Externer Link
4.
Ochoa X, Duval E. Automatic evaluation of metadata quality in digital repositories. International journal on digital libraries. 2009 Aug 1;10(2-3):67-91. DOI: 10.1007/s00799-009-0054-4 Externer Link
5.
McMahon C, Denaxas S. A novel framework for assessing metadata quality in epidemiological and public health research settings. AMIA Summits on Translational Science Proceedings. 2016;2016:199.
6.
Nonnemacher M, Nasseh D, Stausberg J. Datenqualität in der medizinischen Forschung. Berlin: Medizinisch Wissenschaftliche Verlagsgesellschaft; 2014.
7.
Ulrich H, Kock-Schoppenhauer AK, Andersen B, Ingenerf J. Analysis of Annotated Data Models for Improving Data Quality. Studies in Health Technology and Informatics. 2017;243:190–194. DOI: 10.3233/978-1-61499-808-2-190 Externer Link
8.
Kadioglu D, Breil B, Knell C, Lablans M, Mate S, Schlue D, Serve H, Storf H, Ückert F, Wagner TO, Weingardt P. Samply – MDR-A Metadata Repository and Its Application in Various Research Networks. In: 63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Osnabrück, 02.-06.09.2018. Düsseldorf: German Medical Science GMS Publishing House; 2018.