gms | German Medical Science

50. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (gmds)
12. Jahrestagung der Deutschen Arbeitsgemeinschaft für Epidemiologie (dae)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie
Deutsche Arbeitsgemeinschaft für Epidemiologie

12. bis 15.09.2005, Freiburg im Breisgau

Empirical Assessment of the Diagnostic Potential of Mass Spectroscopy for Cancer Detection

Meeting Abstract

Search Medline for

  • Patrick Warnat - Deutsches Krebsforschungszentrum, Heidelberg
  • B. Brors - Deutsches Krebsforschungszentrum, Heidelberg
  • R. Eils - Deutsches Krebsforschungszentrum, Heidelberg

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. Deutsche Arbeitsgemeinschaft für Epidemiologie. 50. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (gmds), 12. Jahrestagung der Deutschen Arbeitsgemeinschaft für Epidemiologie. Freiburg im Breisgau, 12.-15.09.2005. Düsseldorf, Köln: German Medical Science; 2005. Doc05gmds123

The electronic version of this article is the complete one and can be found online at:

Published: September 8, 2005

© 2005 Warnat et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.




In the last few years, the analysis of blood or urine samples by mass spectrometry (MS) combined with advanced data mining algorithms has been conducted in many studies for detection of protein patterns associated with malignancies. As this approach has been reported as a promising tool for noninvasive, early cancer detection, further evaluation of this methodology is an active area of research.

Recent publications [1], [2], [3] raised concerns about whether published results of protein profiling studies are reproducible, effective and unbiased. We here systematically assessed studies that investigated the use of mass spectroscopy for generation of predictive models for cancer detection in order to investigate the strength of current evidence for the predictive performance of this approach.

Materials and Methods

We used an approach similar to that of Ntzani & Ioannidis [4], who asessed the strength of presented evidence on predictive performance in studies that used DNA microarrays for predicting major clinical cancer outcomes.

Here, original studies were selected that used mass spectrometry for the analysis of human plasma, serum or urine samples and where an attempt was made to generate a predictive model for classification of samples into the categories 'sample from patient with malignant cancer disease' versus 'sample from patient without malignant cancer disease'. Studies were excluded that only performed an analysis of MS data for detection of 'differentially expressed' proteins between samples of different groups, without generating a predictive model. In addition, studies that did not present results for cross-validation or validation with an independent set of test samples of their predictive model were excluded. We performed a PubMed search and recorded for each eligible study the characteristics for the population of the samples that had been used, how the results of predictive performance were obtained and presented, and which predictive performance was achieved.


Of all articles retrieved from PubMed, 23 were eligible, all published in either 2002, 2003 or 2004. Most studies analysed serum samples using the SELDI technology for mass spectrometry [5], and most studies aimed at detecting prostate or ovarian cancer. The median number of samples used in the studies was 184 samples , and the median number of samples for the malignant disorder under study was 99 samples. For validation of the performance of the generated predictive models, 13 studies used an independent test set, 8 studies used cross-validation, and two studies presented results for both approaches to validation. Among the studies that used an independent test set, the median number of samples set aside for this purpose was 88 samples. In only 4 of all studies that used an independent test set, confidence intervals for performance measures were presented, and only 2 of these studies presented a receiver operating characteristic (ROC) curve obtained on the independent test set. Of all studies utilising cross-validation, 2 studies presented results obtained from incomplete cross-validation, i.e. not the entire model building process has been repeated for each of the cross-validation training sets. This procedure is not a proper way of predictive performance assessment, as results can be strongly biased [6].

Most studies reported a high predictive performance. The studies that used an independent test set reported a median classification accuracy of 85%, for studies using cross-validation, a median accuracy of 91.5% was reported. Notably, of all studies using independent test sets, 8 studies used test sets of a size smaller than 100 samples. All of these studies reported moderate to high classification accuracies of greater than 76%.


Of the 23 studies assessed, most reported a high predictive performance for detection of cancer by mass spectrometry of blood or urine samples. Most studies used an independent test set for assessment of the generated predictive models, whereby the size of test sets used was only moderate in most cases. As pointed out by Ntzani & Ioannidis [4], 'small studies can give over-promising results in molecular medicine', e.g. because in smaller studies the population of samples investigated is more likely to be homogeneous. Larger test sets are needed in future, ideally also containing samples processed at a later time point or another laboratory, in order to provide better strength of evidence for the predictive performance and reproducibility of the presented methods. A first step in this direction was done in the study of Rogers et al [7], in which also results on a 'late' test set were presented, which consisted of samples analysed 10 months after the analysis of the training samples. The presentation of the results also could be improved by stringently reporting confidence intervals for all measures of predictive performance and by reporting ROC curves, which visualise the trade-off between diagnostic sensitivity and specificity of an predictive model. Only a minority of the assessed studies did present confidence intervals or ROC curves. The use of cross-validation for assessment of the predictive performance of a predictive model is a valid approach when performed correctly. The use of incomplete cross-validation can result in inflated estimates of predictive performance, an effect that already has been extensively described in context of the analysis of DNA microarray data [8], and should be therefore strictly avoided.

In summary, the current strength of evidence for the predictive performance of the assessed studies was variable, with a few good exceptions. In future, more studies with large sample sizes, a correct study design and rigorous and complete presentation of the achievable predictive performance are needed in order to support the first promising results obtained by mass spectrometry for cancer detection.


Ransohoff DF. Bias as a threat to the validity of cancer molecular-marker research. Nat Rev Cancer 2005; 5(2):142-9.
Diamandis EP. Analysis of serum proteomic patterns for early cancer diagnosis: drawingattention to potential problems. J Natl Cancer Inst 2004; 96(5): 353-6.
Baggerly KA, Morris JS, Coombes KR. Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments. Bioinformatics 2004; 20(5): 777-85.
Ntzani EE, Ioannidis JP. Predictive ability of DNA microarrays for cancer outcomes and correlates: an empirical assessment. Lancet 2003; 362(9394): 1439-44.
Merchant M, Weinberger SR. Recent advancements in surface-enhanced laser desorption/ionization-time of flight-mass spectrometry. Electrophoresis 2000; 21:1164-77.
Ransohoff DF. Rules of evidence for cancer molecular-marker discovery and validation. Nat Rev Cancer 2004; 4(4): 309-14.
Rogers MA, Clarke P, Noble J, Munro NP, Paul A, Selby PJ, Banks RE. Proteomic profiling of urinary proteins in renal cancer by surface enhanced laser desorption ionization and neural-network analysis: identification of key issues affecting potential clinical utility. Cancer Res 2003; 63(20): 6971-83.
Simon R, Radmacher MD, Dobbin K, McShane LM. Pitfalls in the use of DNA microarray data for diagnostic and prognostic classification. J Natl Cancer Inst 2003; 95(1): 14-8.