Artikel
Report generator for the external validation of conformity of local clinical data to a global metadata repository (MDR)
Suche in Medline nach
Autoren
Veröffentlicht: | 29. August 2017 |
---|
Gliederung
Text
Introduction (incl. Objective / Requirements): The Clinical Communication Platform (CCP) [1] is concerned with enabling clinical research across multiple sites within the German Cancer Consortium (DKTK) [2], [3]. Given the variations found in local tumour documentation, data harmonization is an essential prerequisite for such research and needs to be assessed by quality assurance.The purpose of the generated report is to check externally the integrity, completeness and conformity of clinical data in local databases against a format defined in a global MDR [4].
State of the art (related Work & short commings): Validation algorithms can commonly be found in export-transform-load (ETL) systems [5], [6], [7]. They usually run automatically and prospectively, with focus on data plausibility, allowing the inclusion of self-correction mechanisms. However, within a decentralized system, new sites are constantly joining the network with heterogeneous and complex databases. In order to store the data in compliance to global MDR, the sites need to implement their own ETL processes. Ideally, they should also create their own validation mechanism; however, this would be time-consuming, error-prone and inconsistent across the network. As an alternative, we present a global, external, retrospective and semi-automatic validation tool with focus on syntactical correctness.
Concept: As the clinical data are read from the local database, the obtained attribute values are compared with the format specified in the MDR, validating their conformity and integrity. Additionally, the occurrences of each possible value are counted in order to assess the completeness of the data. With the generated report in each local system, local problems as well as problems shared with the other systems in the network can be identified.
Implementation: Even though the generator could operate autonomously and independently of a specific backend, it has been integrated within a bridgehead, which interacts with a decentralized system. The report generator is divided in four abstraction layers: a user interface (UI), a Jersey-based [8] RESTful API [9], a process chain and runtime statistics. The UI displays a button for the report generation along with a list of previous files to be downloaded. Runtime statistics calculate the percent completed and the time remaining. Result statistics are stored as an intermediate file. The final Excel reports are created with Apache POI [10].
Lessons Learned: The proposed report generator provides a uniform way to carry out quality control not only for collaboration within a research network but also to improve local documentation processes. Its integration into the bridgehead (rather than into the local database) allows the functionality, in principle, to be used with any compatible database as local storage backend. However, the proposed algorithm can only assess syntactic conformity with data definitions stored in the MDR. Evaluation of semantic correctness and meaningfulness of the data still relies on the review by domain experts in the context of organized quality assurance rounds.
Die Autoren geben an, dass kein Interessenkonflikt besteht.
Die Autoren geben an, dass kein Ethikvotum erforderlich ist.
References
- 1.
- Lablans M, Kadioglu D, Mate S, Leb I, Prokosch HU, Ückert F. Strategien zur Vernetzung von Biobanken. Klassifizierung verschiedener Ansätze zur Probensuche und Ausblick auf die Zukunft in der BBMRI-ERIC. Bundesgesundheitsblatt. 2016;59(3):373–378. DOI: 10.1007/s00103-015-2299-y
- 2.
- Lablans M, Kadioglu D, Muscholl M, Ückert F. Exploiting Distributed, Heterogeneous and Sensitive Data Stocks while Maintaining the Owner's Data Sovereignty. Methods Inf Med. 2015;54(4):346–352. DOI: 10.3414/ME14-01-0137
- 3.
- Deutsches Konsortium für Translationale Krebsforschung. https://dktk.dkfz.de/de/home
- 4.
- ISO/IEC JTC1 SC32 WG2 (2014): ISO/IEC 11179 Information Technology - Metadata registries. Online verfügbar unter: http://metadata-standards.org/11179/
- 5.
- Kettle Pentaho Data Integration Validation. http://wiki.pentaho.com/display/EAI/Data+Validator
- 6.
- Clover ETL Validator. http://doc.cloveretl.com/documentation/UserGuide/index.jsp?topic=/com.cloveretl.gui.docs/docs/validator.html
- 7.
- Barton R. Talend Open Studio Cookbook. Chapter 3: Validating Data. Packt Publishing; 2013
- 8.
- Jersey RESTful Web Services in Java. https://jersey.java.net/
- 9.
- Richardson L, Ruby S, Heinemeier D. RESTful Web Services. O’Reilly Media; 2007
- 10.
- Apache POI. https://poi.apache.org/