gms | German Medical Science

62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

17.09. - 21.09.2017, Oldenburg

Technical Aspects of Data Provenance in Clinical Trials

Meeting Abstract

  • Marcel Parciak - Universitätsmedizin Göttingen, Göttingen, Deutschland
  • Christian R. Bauer - Universitätsmedizin Göttingen, Göttingen, Deutschland
  • Benjamin Baum - Universitätsmedizin Göttingen, Göttingen, Deutschland
  • Harald Kusch - Universitätsmedizin Göttingen, Göttingen, Deutschland
  • Ulrich Sax - Universitätsmedizin Göttingen, Göttingen, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Oldenburg, 17.-21.09.2017. Düsseldorf: German Medical Science GMS Publishing House; 2017. DocAbstr. 288

doi: 10.3205/17gmds155, urn:nbn:de:0183-17gmds1558

Published: August 29, 2017

© 2017 Parciak et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: Translational research projects generate high volume and highly diverse datasets. Questions regarding the history and the workflows involved creating such datasets – commonly described as data and process provenance – often remain unanswered [1]. Provenance data has the potential to improve trust, reproducibility, collaboration and understanding of a projects results [2].

The GCP guidelines require to capture the history and the changes an object has undergone [3], [4]. Common software systems capture such information – the audit records – in order to conform to GCP. As GCP does not enforce any data format for audit records, every software system stores it in a proprietary format, making it difficult to analyze.

The World Wide Web Consortium (W3C) recommended W3C-PROV as a specification to store data about provenance [5]. We used PROV-XML, the XML representation format of W3C-PROV, in order to extract provenance data from a CDISC-ODM record, which was exported using the Export Search tool of secuTrial, a data capture tool for clinical trials, and store it in a standardized format. This abstract provides a brief overview on provenance and our lessons learned from extracting provenance data.

Methods: We mainly utilized two technologies in our approach: CDISC-ODM and W3C-PROV.

The Operational Data Model (ODM) by the Clinical Data Interchange Standards Consortium (CDISC) is an XML-based and GCP-compliant format for clinical trials data [6]. It is designed for interchanging and archiving of clinical trials data and associated metadata.

The W3C released W3C-PROV in order to provide a model for provenance data and accessibility. Core elements are defined in PROV-DM, which includes three terms: Entity, Activity and Agent that can be linked by seven main properties. This model can be expanded to cover more concepts. PROV-DM can be stored in formats such as XML (PROV-XML) and provides links to other ontologies like Dublin Core (PROV-DC).

Results: In a pragmatic approach several elements of ODM have been mapped to PROV-XML using Extensible Stylesheet Language Transformations (XSLT). The resulting PROV-XML dataset is easily queryable by common XML-tools. As ODM comes in two types - snapshot and transactional - it was observable that transactional typed documents are more information rich than snapshot types. This practical implementation of a data provenance access, outside of the data collection software, provides a basis for further discussion and development of unified provenance handling.

Discussion: Common software systems that capture clinical trial data conforming to GCP can deliver a valuable set of provenance data within the datasets. Nevertheless the lack of a common format for provenance data makes it hard to use interoperably. W3C-PROV showed to be a viable go-to data format for provenance information and already provides the means to be queried (PROV-AQ).

Sahoo et al. [7] recently built an ontology to store provenance metadata using W3C-PROV in order to enhance reproducibility in biomedical research and tested it on six clinical studies from the US. A future interest will be to verify the viability of their promising approach for clinical studies that are conducted at the University Medical Center Göttingen.

Acknowledgements: This work was supported by the German Federal Ministry of Education and Research (BMBF) within the framework of the research and funding concepts e:Med (01ZX1306C/sysINFLAME) and i:DSem (031L0024A/MyPathSem).

Die Autoren geben an, dass kein Interessenkonflikt besteht.


References

1.
De Lusignan S, Liaw ST, Krause P, Curcin V, Vicente MT, Michalakidis G, et al. Key concepts to assess the readiness of data for International research: Data quality, lineage and provenance, extraction and processing errors, traceability, and curation. Yearb Med Inform. 2011;6(1):112–120.
2.
Ragan ED, Endert A, Sanyal J, Chen J. Characterizing Provenance in Visualization and Data Analysis: An Organizational Framework of Provenance Types and Purposes. IEEE Transactions on Visualization and Computer Graphics. 2016 Jan;22(1):31–40.
3.
World Health Organization. Guidelines for good clinical practice (GCP) for trials on pharmaceutical products. WHO Technical Report Series. 1995;850:97–137.
4.
ICH Harmonised Guideline. Integrated Addendum to ICH E6(R1): Guideline for Good Clinical Practice ICH. E6(R2). http://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/E6/E6_R2__Step_4.pdf, last accessed April 24, 2017 External link
5.
Groth P, Moreau L. PROV-Overview. An overview of the PROV Family of Documents. 2013 [cited 2017 Jun 01]; Available from: https://www.w3.org/TR/2013/NOTE-prov-overview-20130430/, Archived at: http://www.webcitation.org/6qtBsjYbg External link
6.
Clinical Data Interchange Standards Consortium. Operational Data Model (ODM)-XML [Internet]. CDISC; 2013 [cited 2017 Feb 17]. Available from: https://www.cdisc.org/standards/foundational/odm External link
7.
Sahoo SS, Valdez J, Rueschman M. Scientific Reproducibility in Biomedical Research: Provenance Metadata Ontology for Semantic Annotation of Study Description. AMIA Annu Symp Proc. 2017 Feb 10;2016:1070–9.