Artikel
Challenges and potential bottlenecks for the accomplishment of provenance in biomedical data sets and workflows
Suche in Medline nach
Autoren
Veröffentlicht: | 19. August 2022 |
---|
Gliederung
Text
The accomplishment of provenance in the biomedical context is associated with some hurdles. Challenges and potential bottlenecks were investigated during a Scoping Review and presented as preliminary results.
1. Introduction: Scrutinizing provenance information improves the understanding of data genesis and the relationships between scientific results and source data. As such, research data management practices and FAIR principles consider provenance as one of the research pillars within the German Medical Informatics Initiative (MII) [1], [2]. The accomplishment of provenance aspects in biomedical data sets and workflows comes along with context- and domain-specific challenges in need to be identified and classified.
2. Methods: We carried out a scoping review protocol which addresses five research questions to examine existing evidence on provenance tracking approaches [3]. Here we highlight the first insights to results of the third research question, namely “What are the challenges and potential problems or bottlenecks for the accomplishment of provenance?”. With this question we investigated the specific challenges, potential problems, or bottlenecks for the accomplishment of provenance in the biomedical domain.
????3. Results: A total of 564 papers discussing provenance approaches in the wider biomedical research domain were extracted from PubMed and Web of Science databases. From 469 identified de-duplicated papers, 54 studies fulfilled the screening criteria [3]. Of them, 46 studies from 2006 to 2020 reported challenges while implementing provenance (Figure 1 [Fig. 1]). The most frequently mentioned group of challenges refers to an ‘Information Lack’. These are challenges related to an incomplete model or insufficient metadata [4], to policy issues, and challenges associated with various other reasons including lack of incentive [5] or black-box processing. Knowledge bottlenecks, mainly related to isolated knowledge residing of individual stakeholders, were also encountered [6]. ‘Granularity’ issues covered the aspects of provenance information, performance or storage matters cropped up as well as usability topics indicating the interface complexity. Scalability, reusability of the infrastructure and quality related factors were implicated.
4. Discussion and conclusion: The challenges encountered have mounted over the past 5 years since increasing legal and scientific demands require research projects to be implemented transparently. Being aware of challenges can facilitate an easier scalable provenance construction and consumption while enforcing FAIR principles in the MII [7]. Harmonized engineering efforts are crucial to overcome the hurdles posed. Our upcoming paper discloses details of challenges embedded in the overall context of provenance tracking.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Semler SC, Wissing F, Heyder R. German Medical Informatics Initiative. Methods Inf Med. 2018;57(S 01):e50-e56. DOI: 10.3414/ME18-03-0003
- 2.
- Wilkinson MD, Dumontier M, Aalbersberg IJ, et al. The FAIR Guiding Principles for scientific data. Sci Data. 2016 Mar 15;3:160018. [with addendum: 2019 Mar 19;6:6.]. DOI: 10.1038/sdata.2016.18
- 3.
- Gierend K, Krüger F, Waltemath D, Fünfgeld M, Ganslandt T, Zeleke AA. Approaches and Criteria for Provenance in Biomedical Data Sets and Workflows: Protocol for a Scoping Review. JMIR Res Protoc. 2021;10(11):e31750. DOI: 10.2196/31750
- 4.
- Sahoo SS, Valdez J, Kim M, Rueschman M, Redline S. ProvCaRe: Characterizing scientific reproducibility of biomedical research studies using semantic provenance metadata. Int J Med Inform. 2019;121:10-18. DOI: 10.1016/j.ijmedinf.2018.10.009
- 5.
- Curcin V. Embedding data provenance into the Learning Health System to facilitate reproducible research. Learn Health Syst. 2016;1(2):e10019. DOI: 10.1002/lrh2.10019
- 6.
- Monnin P, Legrand J, Husson G, Ringot P, Tchechmedjiev A, Jonquet C, et al. PGxO and PGxLOD: a reconciliation of pharmacogenomic knowledge of various provenances, enabling further comparison. BMC Bioinformatics. 2019;20(Suppl 4):139. DOI: 10.1186/s12859-019-2693-9
- 7.
- Pugliese P, Knell C, Christoph J. Exchange of Clinical and Omics Data According to FAIR Principles: A Review of Open Source Solutions. Methods Inf Med. 2020;59(S 01):e13-e20. DOI: 10.1055/s-0040-1712968