gms | German Medical Science

GMS Medizinische Informatik, Biometrie und Epidemiologie

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS)

ISSN 1860-9171

Book review: The analysis of gene expression data

Book Review

Search Medline for

  • corresponding author Martin Eisenacher - Universitätsklinikum Münster, Integrierte Funktionelle Genomik (IFG), Münster, Germany

GMS Med Inform Biom Epidemiol 2005;1(2):Doc07

The electronic version of this article is the complete one and can be found online at: http://www.egms.de/en/journals/mibe/2005-1/mibe000007.shtml

Published: June 20, 2005

© 2005 Eisenacher.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.en). You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Bibliographic details

"The Analysis of Gene Expression Data"

G. Parmigiani, E. S. Garrett, R. A. Irizarry, S. L. Zeger (Editors)

Springer-Verlag; 1. Edition (April 8, 2003)

Language: English

Hardcover: 504 pages, Price: $ 95,00

ISBN: 0387955771


Review

The book "The Analysis of Gene Expression Data" is part of the series "Statistics for Biology and Health" and assembled by editors, who are working as biostatisticians and therefore build a link between applied clinical research and theoretical statistics.

In the comprehensive introduction chapter the editors themselves describe the basics of microarray technology and give a systematic overview about the four different phases of expression analysis (experimental design, signal extraction, data analysis, validation and interpretation) and the various methods, which are applied in these phases. This chapter is very good suited for researchers that newly encounter the gene expression field. Remarkable is the of course necessary, but often neglected distinction between (spotted, two-channel) cDNA and oligonucleotide arrays (consisting of probe sets), which require different treatment especially in the preprocessing/signal extraction step. Sources of technical and biological variance and their effects and demands (for example normalization) are discussed as well as the different "classical" analysis types (screening for differentially expressed genes, unsupervised analysis for the detection of new classes, supervised analysis for class prediction and classification). One section discusses the challenges of genome biometry analyses, which are for example located in the usage of large expression databases in order to create global hypotheses about cellular interrelations. At the end of the first chapter some tools and packages are listed, which are not elaborately described in the book, but are nonetheless helpful for microarray analysis.

In the remaining chapters, different authors describe their own tools and packages in detail, which are free for academic use. These special chapters each give a comprehensive explanation about either a statistical concept (with implementation), a programming library or a web-based or stand-alone computer program. The explanations are user-friendly, because they give installation instructions, code examples, screenshots of the graphical user interface or analysis results and plotted graphs (38 color plates and many b/w figures). Nevertheless, besides the application-oriented style and the consideration of practical problems the statistical background is always comprehensible through formal introductions or summaries and references to further literature.

Three Bioconductor R packages are described first: for visualization and annotation of genomic experiments, for exploratory analysis and normalization of cDNA microarray data and for analyses of Affymetrix oligonucleotide arrays. Other chapters explain dChip, Expression Profiler, an SPLUS library for differential expression, DRAGON, SNOMAD, Microarray Explorer, SAM, Adaptive Gene Picking, MAANOVA, GeneClust and POE. Three chapters are dedicated to Bayesian methods (Parametric Empirical Bayes Methods, Bayes Decomposition and Bayesian Clustering of Gene Expression Dynamics). The last chapter is a first step towards system biology: through relevance networks positive and negative correlations between genes can be visualized and searched, so that it may be possible to find genetic regulatory networks within microarray data.

Topics that one may miss in the book include image analysis and sample size calculations (as mentioned by the editors), the analysis of expression data in relation to pathway maps and GeneOntology identifiers (possible for example with the free GenMAPP tool) and in relation to protein interaction data.

Through http://www.arraybook.org updated links to the discussed packages are maintained.