Article
Technical Aspects of Data Provenance in Clinical Trials
Search Medline for
Authors
Published: | August 29, 2017 |
---|
Outline
Text
Introduction: Translational research projects generate high volume and highly diverse datasets. Questions regarding the history and the workflows involved creating such datasets – commonly described as data and process provenance – often remain unanswered [1]. Provenance data has the potential to improve trust, reproducibility, collaboration and understanding of a projects results [2].
The GCP guidelines require to capture the history and the changes an object has undergone [3], [4]. Common software systems capture such information – the audit records – in order to conform to GCP. As GCP does not enforce any data format for audit records, every software system stores it in a proprietary format, making it difficult to analyze.
The World Wide Web Consortium (W3C) recommended W3C-PROV as a specification to store data about provenance [5]. We used PROV-XML, the XML representation format of W3C-PROV, in order to extract provenance data from a CDISC-ODM record, which was exported using the Export Search tool of secuTrial, a data capture tool for clinical trials, and store it in a standardized format. This abstract provides a brief overview on provenance and our lessons learned from extracting provenance data.
Methods: We mainly utilized two technologies in our approach: CDISC-ODM and W3C-PROV.
The Operational Data Model (ODM) by the Clinical Data Interchange Standards Consortium (CDISC) is an XML-based and GCP-compliant format for clinical trials data [6]. It is designed for interchanging and archiving of clinical trials data and associated metadata.
The W3C released W3C-PROV in order to provide a model for provenance data and accessibility. Core elements are defined in PROV-DM, which includes three terms: Entity, Activity and Agent that can be linked by seven main properties. This model can be expanded to cover more concepts. PROV-DM can be stored in formats such as XML (PROV-XML) and provides links to other ontologies like Dublin Core (PROV-DC).
Results: In a pragmatic approach several elements of ODM have been mapped to PROV-XML using Extensible Stylesheet Language Transformations (XSLT). The resulting PROV-XML dataset is easily queryable by common XML-tools. As ODM comes in two types - snapshot and transactional - it was observable that transactional typed documents are more information rich than snapshot types. This practical implementation of a data provenance access, outside of the data collection software, provides a basis for further discussion and development of unified provenance handling.
Discussion: Common software systems that capture clinical trial data conforming to GCP can deliver a valuable set of provenance data within the datasets. Nevertheless the lack of a common format for provenance data makes it hard to use interoperably. W3C-PROV showed to be a viable go-to data format for provenance information and already provides the means to be queried (PROV-AQ).
Sahoo et al. [7] recently built an ontology to store provenance metadata using W3C-PROV in order to enhance reproducibility in biomedical research and tested it on six clinical studies from the US. A future interest will be to verify the viability of their promising approach for clinical studies that are conducted at the University Medical Center Göttingen.
Acknowledgements: This work was supported by the German Federal Ministry of Education and Research (BMBF) within the framework of the research and funding concepts e:Med (01ZX1306C/sysINFLAME) and i:DSem (031L0024A/MyPathSem).
Die Autoren geben an, dass kein Interessenkonflikt besteht.
References
- 1.
- De Lusignan S, Liaw ST, Krause P, Curcin V, Vicente MT, Michalakidis G, et al. Key concepts to assess the readiness of data for International research: Data quality, lineage and provenance, extraction and processing errors, traceability, and curation. Yearb Med Inform. 2011;6(1):112–120.
- 2.
- Ragan ED, Endert A, Sanyal J, Chen J. Characterizing Provenance in Visualization and Data Analysis: An Organizational Framework of Provenance Types and Purposes. IEEE Transactions on Visualization and Computer Graphics. 2016 Jan;22(1):31–40.
- 3.
- World Health Organization. Guidelines for good clinical practice (GCP) for trials on pharmaceutical products. WHO Technical Report Series. 1995;850:97–137.
- 4.
- ICH Harmonised Guideline. Integrated Addendum to ICH E6(R1): Guideline for Good Clinical Practice ICH. E6(R2). http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E6/E6_R2__Step_4.pdf, last accessed April 24, 2017
- 5.
- Groth P, Moreau L. PROV-Overview. An overview of the PROV Family of Documents. 2013 [cited 2017 Jun 01]; Available from: https://www.w3.org/TR/2013/NOTE-prov-overview-20130430/, Archived at: http://www.webcitation.org/6qtBsjYbg
- 6.
- Clinical Data Interchange Standards Consortium. Operational Data Model (ODM)-XML [Internet]. CDISC; 2013 [cited 2017 Feb 17]. Available from: https://www.cdisc.org/standards/foundational/odm
- 7.
- Sahoo SS, Valdez J, Rueschman M. Scientific Reproducibility in Biomedical Research: Provenance Metadata Ontology for Semantic Annotation of Study Description. AMIA Annu Symp Proc. 2017 Feb 10;2016:1070–9.