Article
Towards a Standardized Instruction Manual for Data Use: Biomedical Data Use Metadata Revisited
Search Medline for
Authors
| Published: | September 6, 2024 |
|---|
Outline
Text
Introduction: Data sets are rarely annotated sufficiently, hindering reuse within a new research purpose. As data reuse is particularly interesting in networked research, many projects established and validated data sharing frameworks and pipelines, differing in details regarding data reuse options. Bönisch et al. [1] presented types of metadata that ought to be provided for requesting data use. Additionally, standardization of the actual data use objects is urgently needed. The Global Alliance for Genomics and Health (GA4GH) developed an ontology specifying human and machine readable terms. Because this is not limited to individual research areas, this agnostic data use ontology (DUO) [2], could benefit data discoverability and transparency. The DUO offers a hierarchical vocabulary expressing future data use, yet it was unclear whether an application to NFDI4Health data sources and to larger data repositories outside of GA4GH is possible.
We investigate the capabilities of the DUO and compare them with the results of the data use agreements metadata provided by Bönisch et al.
Methods: We revisited the data use agreement terms of Bönisch et al. and divided these and the DUO terms in several categories. Next, we sketched a mapping between the data use agreements and DUO.
Results: We identified three categories: “Research Project”, which contains project related restrictions, “Prerequisites”, which need consideration before receiving data and “Data Usage Guidelines (DUG)”, which defines how researchers have to handle the data after receipt.
While both approaches have different scopes, Bönisch et al. focused on DUG, the DUO focused on Research Projects and Prerequisites. Therefore a suitable mapping is laborious. Since DUO is based on OWL it is possible to add information semantically. However, this leads to less standardization.
Discussion: None of the examined vocabularies entail a complete coverage for all required information. DUO is a good approach to improve machine readability of data use terms [2]. But the current version is not sufficient to annotate all required information due to its focus. One solution might be to extend the existing DUO ontology with further terms, especially about the DUG. Alternatively DUO could be combined with other ontologies covering the DUG.
In the light of the European Health Data Space (EHDS) we see an urgent need to sustainably annotate data sources with standardized machine readable terms about the data use permissions and to offer options to automate data use requests and data use pipelines. As we see a practical use in applying the DUO to our and other NFDI projects we see the need for subsequently annotating the relevant metadata items in the NFDI4Health metadata scheme V3.3 [3].
Using shared data is still limited. The “willingness to share” data [4] could be expressed by providing metadata about the actual data and how the data should be reused. This requires action from data holding organizations but catalyzes options for projects preparing the EHDS [5] including MII, NUM and NFDI4Health.
Acknowledgements: Research was supported by DFG grant SA 1009/3-2, WI 1605/10-2 and BMBF grant NFDI4Health 442326535.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Bönisch C, Hanß S, Spicher N, Sax U, Krefting D. Reusing Biomedical Data as Agreed – Towards Structured Metadata for Data Use Agreements. In: German Medical Data Sciences 2023 – Science Close to People. IOS Press; 2023. p. 31–8. DOI: 10.3233/SHTI230690
- 2.
- Lawson J, Cabili MN, Kerry G, Boughtwood T, Thorogood A, Alper P, et al. The Data Use Ontology to streamline responsible access to human biomedical datasets. Cell Genomics. 2021 Nov 10;1(2):100028. DOI: 10.1016/j.xgen.2021.100028
- 3.
- Abaza H, Shutsko A, Golebiewski M, Klopfenstein S, Schmidt C, Vorisek C, et al. NFDI4Health Task Force COVID-19. The NFDI4Health Metadata Schema (V3.3). Fachrepositorium Lebenswissenschaften; 2023. DOI: 10.4126/FRL01-006472531
- 4.
- Mansmann U, Locher C, Prasser F, et al. Implementing clinical trial data sharing requires training a new generation of biomedical researchers. Nat Med. 2023;29:298–301. DOI: 10.1038/s41591-022-02080-y
- 5.
- European Commission. Proposal for a regulation - The European Health Data Space. [cited 2023 Mar 31]. Available from: https://health.ec.europa.eu/publications/proposal-regulation-european-health-data-space_en
