gms | German Medical Science

67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

21.08. - 25.08.2022, online

Challenges and potential bottlenecks for the accomplishment of provenance in biomedical data sets and workflows

Meeting Abstract

  • Kerstin Gierend - Department of Biomedical Informatics, Center for Preventive Medicine and Digital Health, Medical Faculty Mannheim of Heidelberg University, Mannheim, Germany
  • Frank Krüger - Department of Communications Engineering, University of Rostock, Rostock, Germany
  • Sascha Genehr - Department of Communications Engineering, University of Rostock, Rostock, Germany
  • Francisca Hartmann - Department of Biomedical Informatics, Center for Preventive Medicine and Digital Health, Medical Faculty Mannheim of Heidelberg University, Mannheim, Germany
  • Thomas Ganslandt - Chair of Medical Informatics, Friedrich-Alexander-University Erlangen-Nürnberg, Erlangen, Germany
  • Dagmar Waltemath - Department of Medical Informatics, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany
  • Atinkut Alamirrew Zeleke - Department of Medical Informatics, Institute for Community Medicine, University Medicine Greifswald, Greifswald, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 21.-25.08.2022. Düsseldorf: German Medical Science GMS Publishing House; 2022. DocAbstr. 163

doi: 10.3205/22gmds062, urn:nbn:de:0183-22gmds0625

Veröffentlicht: 19. August 2022

© 2022 Gierend et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

The accomplishment of provenance in the biomedical context is associated with some hurdles. Challenges and potential bottlenecks were investigated during a Scoping Review and presented as preliminary results.

1. Introduction: Scrutinizing provenance information improves the understanding of data genesis and the relationships between scientific results and source data. As such, research data management practices and FAIR principles consider provenance as one of the research pillars within the German Medical Informatics Initiative (MII) [1], [2]. The accomplishment of provenance aspects in biomedical data sets and workflows comes along with context- and domain-specific challenges in need to be identified and classified.

2. Methods: We carried out a scoping review protocol which addresses five research questions to examine existing evidence on provenance tracking approaches [3]. Here we highlight the first insights to results of the third research question, namely “What are the challenges and potential problems or bottlenecks for the accomplishment of provenance?”. With this question we investigated the specific challenges, potential problems, or bottlenecks for the accomplishment of provenance in the biomedical domain.

????3. Results: A total of 564 papers discussing provenance approaches in the wider biomedical research domain were extracted from PubMed and Web of Science databases. From 469 identified de-duplicated papers, 54 studies fulfilled the screening criteria [3]. Of them, 46 studies from 2006 to 2020 reported challenges while implementing provenance (Figure 1 [Fig. 1]). The most frequently mentioned group of challenges refers to an ‘Information Lack’. These are challenges related to an incomplete model or insufficient metadata [4], to policy issues, and challenges associated with various other reasons including lack of incentive [5] or black-box processing. Knowledge bottlenecks, mainly related to isolated knowledge residing of individual stakeholders, were also encountered [6]. ‘Granularity’ issues covered the aspects of provenance information, performance or storage matters cropped up as well as usability topics indicating the interface complexity. Scalability, reusability of the infrastructure and quality related factors were implicated.

4. Discussion and conclusion: The challenges encountered have mounted over the past 5 years since increasing legal and scientific demands require research projects to be implemented transparently. Being aware of challenges can facilitate an easier scalable provenance construction and consumption while enforcing FAIR principles in the MII [7]. Harmonized engineering efforts are crucial to overcome the hurdles posed. Our upcoming paper discloses details of challenges embedded in the overall context of provenance tracking.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Semler SC, Wissing F, Heyder R. German Medical Informatics Initiative. Methods Inf Med. 2018;57(S 01):e50-e56. DOI: 10.3414/ME18-03-0003 Externer Link
2.
Wilkinson MD, Dumontier M, Aalbersberg IJ, et al. The FAIR Guiding Principles for scientific data. Sci Data. 2016 Mar 15;3:160018. [with addendum: 2019 Mar 19;6:6.]. DOI: 10.1038/sdata.2016.18 Externer Link
3.
Gierend K, Krüger F, Waltemath D, Fünfgeld M, Ganslandt T, Zeleke AA. Approaches and Criteria for Provenance in Biomedical Data Sets and Workflows: Protocol for a Scoping Review. JMIR Res Protoc. 2021;10(11):e31750. DOI: 10.2196/31750 Externer Link
4.
Sahoo SS, Valdez J, Kim M, Rueschman M, Redline S. ProvCaRe: Characterizing scientific reproducibility of biomedical research studies using semantic provenance metadata. Int J Med Inform. 2019;121:10-18. DOI: 10.1016/j.ijmedinf.2018.10.009 Externer Link
5.
Curcin V. Embedding data provenance into the Learning Health System to facilitate reproducible research. Learn Health Syst. 2016;1(2):e10019. DOI: 10.1002/lrh2.10019 Externer Link
6.
Monnin P, Legrand J, Husson G, Ringot P, Tchechmedjiev A, Jonquet C, et al. PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison. BMC Bioinformatics. 2019;20(Suppl 4):139. DOI: 10.1186/s12859-019-2693-9 Externer Link
7.
Pugliese P, Knell C, Christoph J. Exchange of Clinical and Omics Data According to FAIR Principles: A Review of Open Source Solutions. Methods Inf Med. 2020;59(S 01):e13-e20. DOI: 10.1055/s-0040-1712968 Externer Link