gms | German Medical Science

54. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

07. bis 10.09.2009, Essen

Improved interpretation of microarray data with gene groups: Cancer classification and survival prognosis

Meeting Abstract

Search Medline for

  • Jörg Rahnenführer - TU Dortmund, Fakultät Statistik, Dortmund

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 54. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (gmds). Essen, 07.-10.09.2009. Düsseldorf: German Medical Science GMS Publishing House; 2009. Doc09gmds109

DOI: 10.3205/09gmds109, URN: urn:nbn:de:0183-09gmds1092

Published: September 2, 2009

© 2009 Rahnenführer.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.en). You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Outline

Text

Bioinformatics research in the post-genomic era has to cope with a flood of high-dimensional data sets. The ultimate goal is a personalized medicine that uses measurements from individual patients for an improved diagnosis and therapy of diseases. The high complexity and noise levels in the data require the development and application of suitable statistical models and algorithmic procedures. However, to answer biologically relevant questions, expertise in statistics and computer science has to be combined with meaningful biological modelling.

The result of a typical microarray experiment is a long list of genes with corresponding expression measurements. The interpretation of such high-dimensional data is difficult, both in terms of statistics and regarding biology and medicine. A modern, popular and promising approach for a meaningful dimension reduction is to integrate into the analysis biological a priori knowledge in the form of predefined functional gene groups, for example based on the Gene Ontology (GO). Instead of identifying important single genes, relevant groups of genes with a common biological function are detected.

We present two applications for this approach, cancer classification and survival prognosis. In the first part, we describe the general procedure for scoring the statistical significance of gene groups and therefore the impact of corresponding biological processes on cancer classification. In addition, we demonstrate how this approach can be improved by integrating information on the relationships between gene groups. In the second part, we show how gene groups can be used for building survival prediction models based on the Cox regression model. We apply several feature selection procedures in order to generate predictive models for future patients. We show that adding gene groups as covariates to survival models built from single genes improves interpretability while prediction performance remains stable.