gms | German Medical Science

62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

17.09. - 21.09.2017, Oldenburg

Report generator for the external validation of conformity of local clinical data to a global metadata repository (MDR)

Meeting Abstract

  • David Juárez - Deutsches Krebsforschungszentrum, Heidelberg, Deutschland
  • Esther Schmidt - Deutsches Krebsforschungszentrum, Heidelberg, Deutschland
  • Karsten Senghas - Deutsches Krebsforschungszentrum, Heidelberg, Deutschland
  • Frank Ückert - Deutsches Krebsforschungszentrum, Heidelberg, Deutschland
  • Martin Lablans - Deutsches Krebsforschungszentrum, Heidelberg, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Oldenburg, 17.-21.09.2017. Düsseldorf: German Medical Science GMS Publishing House; 2017. DocAbstr. 293

doi: 10.3205/17gmds169, urn:nbn:de:0183-17gmds1698

Veröffentlicht: 29. August 2017

© 2017 Juárez et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe



Introduction (incl. Objective / Requirements): The Clinical Communication Platform (CCP) [1] is concerned with enabling clinical research across multiple sites within the German Cancer Consortium (DKTK) [2], [3]. Given the variations found in local tumour documentation, data harmonization is an essential prerequisite for such research and needs to be assessed by quality assurance.The purpose of the generated report is to check externally the integrity, completeness and conformity of clinical data in local databases against a format defined in a global MDR [4].

State of the art (related Work & short commings): Validation algorithms can commonly be found in export-transform-load (ETL) systems [5], [6], [7]. They usually run automatically and prospectively, with focus on data plausibility, allowing the inclusion of self-correction mechanisms. However, within a decentralized system, new sites are constantly joining the network with heterogeneous and complex databases. In order to store the data in compliance to global MDR, the sites need to implement their own ETL processes. Ideally, they should also create their own validation mechanism; however, this would be time-consuming, error-prone and inconsistent across the network. As an alternative, we present a global, external, retrospective and semi-automatic validation tool with focus on syntactical correctness.

Concept: As the clinical data are read from the local database, the obtained attribute values are compared with the format specified in the MDR, validating their conformity and integrity. Additionally, the occurrences of each possible value are counted in order to assess the completeness of the data. With the generated report in each local system, local problems as well as problems shared with the other systems in the network can be identified.

Implementation: Even though the generator could operate autonomously and independently of a specific backend, it has been integrated within a bridgehead, which interacts with a decentralized system. The report generator is divided in four abstraction layers: a user interface (UI), a Jersey-based [8] RESTful API [9], a process chain and runtime statistics. The UI displays a button for the report generation along with a list of previous files to be downloaded. Runtime statistics calculate the percent completed and the time remaining. Result statistics are stored as an intermediate file. The final Excel reports are created with Apache POI [10].

Lessons Learned: The proposed report generator provides a uniform way to carry out quality control not only for collaboration within a research network but also to improve local documentation processes. Its integration into the bridgehead (rather than into the local database) allows the functionality, in principle, to be used with any compatible database as local storage backend. However, the proposed algorithm can only assess syntactic conformity with data definitions stored in the MDR. Evaluation of semantic correctness and meaningfulness of the data still relies on the review by domain experts in the context of organized quality assurance rounds.

Die Autoren geben an, dass kein Interessenkonflikt besteht.

Die Autoren geben an, dass kein Ethikvotum erforderlich ist.


Lablans M, Kadioglu D, Mate S, Leb I, Prokosch HU, Ückert F. Strategien zur Vernetzung von Biobanken. Klassifizierung verschiedener Ansätze zur Probensuche und Ausblick auf die Zukunft in der BBMRI-ERIC. Bundesgesundheitsblatt. 2016;59(3):373–378. DOI: 10.1007/s00103-015-2299-y Externer Link
Lablans M, Kadioglu D, Muscholl M, Ückert F. Exploiting Distributed, Heterogeneous and Sensitive Data Stocks while Maintaining the Owner's Data Sovereignty. Methods Inf Med. 2015;54(4):346–352. DOI: 10.3414/ME14-01-0137 Externer Link
Deutsches Konsortium für Translationale Krebsforschung. Externer Link
ISO/IEC JTC1 SC32 WG2 (2014): ISO/IEC 11179 Information Technology - Metadata registries. Online verfügbar unter: Externer Link
Kettle Pentaho Data Integration Validation. Externer Link
Clover ETL Validator. Externer Link
Barton R. Talend Open Studio Cookbook. Chapter 3: Validating Data. Packt Publishing; 2013
Jersey RESTful Web Services in Java. Externer Link
Richardson L, Ruby S, Heinemeier D. RESTful Web Services. O’Reilly Media; 2007
Apache POI. Externer Link