gms | German Medical Science

62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

17.09. - 21.09.2017, Oldenburg

Persistent Identifiers (PID) as means to improve data protection and transparency of clinical trials

Meeting Abstract

Suche in Medline nach

  • Wolfgang Kuchinke - Heinrich-Heine Universität Düsseldorf, Düsseldorf, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Oldenburg, 17.-21.09.2017. Düsseldorf: German Medical Science GMS Publishing House; 2017. DocAbstr. 063

doi: 10.3205/17gmds143, urn:nbn:de:0183-17gmds1436

Veröffentlicht: 29. August 2017

© 2017 Kuchinke.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe



Introduction: Persistent identifiers (PID) have grown in importance for data handling to deal with the growing amount and the increased complexity of data in many areas of research and the PID centered data management might be a solution for the automatisation of data flows and enhanced data provenance. By providing a constant reference to data sets, PIDs render data citable for primary and secondary analysis. In medical research PID usage plays still no prominent role for the support of data workflow processes. The EU project CORBEL searches for ways to make individual-level data of clinical trials available for re-use and secondary analysis. As a step to facilitate an opening of clinical trials data for research purposes, we suggest a model for the consistent adoption of persistent identifiers (PIDs) for clinical trials.

Methodes: Based on the work of the RDA (Research Data Alliance) PID group and the recommendations of the GEDE group about PID system specifications, a clinical trials model was created that connects the dataflow of clinical trials with labeling of digital objects (DO) by PIDs. In addition, the PID concept wss applied to the zone model of data privacy protection.

Results: In our concept, PIDs are attached to all data objects (DO) of clinical trials, including patients, investigators, institutions, sites, as well as to data sets, documents, software versions, etc. and PIDs are assigned as early as possible, already during the generation of data in the EDC system, by a patient reported outcome (PRO) system or by sensors. PIDs being attached to data and documents accompany theses DOs through the different steps of clinical trial conduct until the storage of clinical trial DOs and the publication of trial results. Because the PID is attached, like a label, to the DO, it can be used for provenance tracking to demonstrate how and by whom the datasets / documents were created, processed and analysed. PIDs can label data according to the sensitivity of their data sources. In this way, PIDs are acting as data labels pointing to metadata that include information about the sensitivity of data and their data protection requirements allowing automatic operations on the data by privacy filters in the different data protection zones the data move according to the clinical triall life cycle.

Discussion: Recently an acceleration of the discussions about PID centered data management resulted in recommendations for standardised PID employment worked out by RDA with the aim to create a Global Digital Object Cloud. With our model, we participate in this discussion and suggest to extend the usage of PIDs to sensitive data in clinical trials for referencing, provenance tracking and data privacy protection. PIDs with associated metadata information can instruct services, like privacy filters, how to apply suitable data processing technologies to the data sets, like pseudonymisation, obfuscation and anonymisation. In this manner the PID centric approach to data management can open a way towards a new and more open, but controlled, data handling and can build trust that the data of data subjects is used properly.

Die Autoren geben an, dass kein Interessenkonflikt besteht.

Die Autoren geben an, dass kein Ethikvotum erforderlich ist.