gms | German Medical Science

51. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (gmds)

10. - 14.09.2006, Leipzig

An integrated analysis platform for experimental and clinical data in modern cancer research studies

Meeting Abstract

Suche in Medline nach

  • Jörg Lange - Uni Leipzig, Leipzig
  • Toralf Kirsten - Uni Leipzig, Leipzig
  • Erhard Rahm - Uni Leipzig, Leipzig

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (gmds). 51. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. Leipzig, 10.-14.09.2006. Düsseldorf, Köln: German Medical Science; 2006. Doc06gmds215

Die elektronische Version dieses Artikels ist vollständig und ist verfügbar unter:

Veröffentlicht: 1. September 2006

© 2006 Lange et al.
Dieser Artikel ist ein Open Access-Artikel und steht unter den Creative Commons Lizenzbedingungen ( Er darf vervielfältigt, verbreitet und öffentlich zugänglich gemacht werden, vorausgesetzt dass Autor und Quelle genannt werden.




To investigate molecular-genetic causes and effects of diseases and their therapies it becomes increasingly important to combine data from clinical trials with high volumes of experimental data generated using various chip technologies and their annotations. We present our approach to integrate such data for two large collaborative cancer research studies in Germany – the Molecular Mechanism of Malignant Lymphoma (MMML) and the German Glioma Network. Our platform interconnects a commercial study management system (eRN) with a data warehouse-based gene expression analysis system (GeWare) [1]. We utilize a generic approach to import different anonymized pathological and patient-related annotations into the warehouse. The platform also integrates different forms of experimental data and public molecular-genetic annotation data and thus supports a wide range of collaborative analyses for both clinical and non-clinical parameters.


We have developed a comprehensive data integration and analysis platform at the University of Leipzig interconnecting two existing data management systems. On the one hand the study management system eRN allows users at participating institutions to remotely enter all data typically handled in traditional clinical trials e.g. patient-related personal, clinical, and pathological data. To support high data quality the system implements different rule-based input and consistency checks which indicate input imbalances or missing data to be corrected by users. On the other hand the GeWare system deals with chip-based gene expression and array-CGH data and comprises different reports and analysis methods. Chip data is much more voluminous than the patient-related data and cannot be stored within eRN. GeWare provides web interfaces to upload new experimental data and to specify further annotations like laboratory parameters. To combine patient-related data with chip-based data for combined analysis, GeWare also imports a subset of patient-related data from eRN in a generic manner using so called annotation templates. While the patient-related data is identified by the patient identifier, the chip-based data utilizes a chip identifier from which the patient identifier can not be derived. We thus provide a mapping table associating each chip identifier with the corresponding patient identifier to correctly combine clinical, pathological and experimental data and to permit an over-spanning data analysis. In addition, GeWare integrates publicly available gene/clone annotation data for extended analysis possibilities. This data integration is performed by a query mediator approach [2].


We established a warehouse-based platform combining clinical experimental chip data for large-scale collaborative cancer research studies and based on two dedicated subsystems for managing clinical trials and gene expression analysis. Selected clinical annotations were imported by daily transfer from the study system and combined with data of centrally performed molecular-biological high-throughput experiments. Annotations are managed generically to easily support different studies and changing analysis needs. Grouping functions for genes, probes and samples that can be used later within analyses are available. Interactive Analyses for data visualization (e.g. heatmaps as displayed in Figure 1 [Fig. 1]) allow a quick overview for hypothesis generation and statistic reports indicate significant values of the large-scale array data. Furthermore desired data can be extracted for specific analyses outside the platform.


The analysis platform described here proved to be a valuable tool for storing, accessing and analysing high-dimensional gene expression and array-CGH data together with clinical, histopathological and other experimental data. The web-based interface allows interactive analyses for experimenters and the results are stored for further methods. The platform runs successfully within the cancer project MMML and will be extended for the aims of the German Glioma Network.


Kirsten T, Lange J, Rahm E. An integrated platform for analyzing molecular-biological data within clinical studies. Proc. Workshop Information Integration in Healthcare Applications at 10th EDBT Conference. Munich. March 2006.
Kirsten T, Do HH, Rahm E, Körner C. Hybrid Integration of molecular-biological Annotation Data. Proc. 2nd Int. Workshop on Data Integration in the Life Science. San Diego: Springer LNBI; 2005.