gms | German Medical Science

62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

17.09. - 21.09.2017, Oldenburg

Technical Aspects of Data Provenance in Clinical Trials

Meeting Abstract

Suche in Medline nach

  • Marcel Parciak - Universitätsmedizin Göttingen, Göttingen, Deutschland
  • Christian R. Bauer - Universitätsmedizin Göttingen, Göttingen, Deutschland
  • Benjamin Baum - Universitätsmedizin Göttingen, Göttingen, Deutschland
  • Harald Kusch - Universitätsmedizin Göttingen, Göttingen, Deutschland
  • Ulrich Sax - Universitätsmedizin Göttingen, Göttingen, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Oldenburg, 17.-21.09.2017. Düsseldorf: German Medical Science GMS Publishing House; 2017. DocAbstr. 288

doi: 10.3205/17gmds155, urn:nbn:de:0183-17gmds1558

Veröffentlicht: 29. August 2017

© 2017 Parciak et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe



Introduction: Translational research projects generate high volume and highly diverse datasets. Questions regarding the history and the workflows involved creating such datasets – commonly described as data and process provenance – often remain unanswered [1]. Provenance data has the potential to improve trust, reproducibility, collaboration and understanding of a projects results [2].

The GCP guidelines require to capture the history and the changes an object has undergone [3], [4]. Common software systems capture such information – the audit records – in order to conform to GCP. As GCP does not enforce any data format for audit records, every software system stores it in a proprietary format, making it difficult to analyze.

The World Wide Web Consortium (W3C) recommended W3C-PROV as a specification to store data about provenance [5]. We used PROV-XML, the XML representation format of W3C-PROV, in order to extract provenance data from a CDISC-ODM record, which was exported using the Export Search tool of secuTrial, a data capture tool for clinical trials, and store it in a standardized format. This abstract provides a brief overview on provenance and our lessons learned from extracting provenance data.

Methods: We mainly utilized two technologies in our approach: CDISC-ODM and W3C-PROV.

The Operational Data Model (ODM) by the Clinical Data Interchange Standards Consortium (CDISC) is an XML-based and GCP-compliant format for clinical trials data [6]. It is designed for interchanging and archiving of clinical trials data and associated metadata.

The W3C released W3C-PROV in order to provide a model for provenance data and accessibility. Core elements are defined in PROV-DM, which includes three terms: Entity, Activity and Agent that can be linked by seven main properties. This model can be expanded to cover more concepts. PROV-DM can be stored in formats such as XML (PROV-XML) and provides links to other ontologies like Dublin Core (PROV-DC).

Results: In a pragmatic approach several elements of ODM have been mapped to PROV-XML using Extensible Stylesheet Language Transformations (XSLT). The resulting PROV-XML dataset is easily queryable by common XML-tools. As ODM comes in two types - snapshot and transactional - it was observable that transactional typed documents are more information rich than snapshot types. This practical implementation of a data provenance access, outside of the data collection software, provides a basis for further discussion and development of unified provenance handling.

Discussion: Common software systems that capture clinical trial data conforming to GCP can deliver a valuable set of provenance data within the datasets. Nevertheless the lack of a common format for provenance data makes it hard to use interoperably. W3C-PROV showed to be a viable go-to data format for provenance information and already provides the means to be queried (PROV-AQ).

Sahoo et al. [7] recently built an ontology to store provenance metadata using W3C-PROV in order to enhance reproducibility in biomedical research and tested it on six clinical studies from the US. A future interest will be to verify the viability of their promising approach for clinical studies that are conducted at the University Medical Center Göttingen.

Acknowledgements: This work was supported by the German Federal Ministry of Education and Research (BMBF) within the framework of the research and funding concepts e:Med (01ZX1306C/sysINFLAME) and i:DSem (031L0024A/MyPathSem).

Die Autoren geben an, dass kein Interessenkonflikt besteht.


De Lusignan S, Liaw ST, Krause P, Curcin V, Vicente MT, Michalakidis G, et al. Key concepts to assess the readiness of data for International research: Data quality, lineage and provenance, extraction and processing errors, traceability, and curation. Yearb Med Inform. 2011;6(1):112–120.
Ragan ED, Endert A, Sanyal J, Chen J. Characterizing Provenance in Visualization and Data Analysis: An Organizational Framework of Provenance Types and Purposes. IEEE Transactions on Visualization and Computer Graphics. 2016 Jan;22(1):31–40.
World Health Organization. Guidelines for good clinical practice (GCP) for trials on pharmaceutical products. WHO Technical Report Series. 1995;850:97–137.
ICH Harmonised Guideline. Integrated Addendum to ICH E6(R1): Guideline for Good Clinical Practice ICH. E6(R2)., last accessed April 24, 2017 Externer Link
Groth P, Moreau L. PROV-Overview. An overview of the PROV Family of Documents. 2013 [cited 2017 Jun 01]; Available from:, Archived at: Externer Link
Clinical Data Interchange Standards Consortium. Operational Data Model (ODM)-XML [Internet]. CDISC; 2013 [cited 2017 Feb 17]. Available from: Externer Link
Sahoo SS, Valdez J, Rueschman M. Scientific Reproducibility in Biomedical Research: Provenance Metadata Ontology for Semantic Annotation of Study Description. AMIA Annu Symp Proc. 2017 Feb 10;2016:1070–9.