gms | German Medical Science

50. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (gmds)
12. Jahrestagung der Deutschen Arbeitsgemeinschaft für Epidemiologie (dae)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie
Deutsche Arbeitsgemeinschaft für Epidemiologie

12. bis 15.09.2005, Freiburg im Breisgau

Development of classifiers for diagnosis - regression models, trees or tree ensembles?

Meeting Abstract

Search Medline for

  • Willi Sauerbrei - Institut für Medizinische Biometrie und Informatik, Universitätsklinikum Freiburg, Freiburg
  • R. Stollhoff - Institut für Medizinische Biometrie und Informatik, Universitätsklinikum Freiburg, Freiburg

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. Deutsche Arbeitsgemeinschaft für Epidemiologie. 50. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (gmds), 12. Jahrestagung der Deutschen Arbeitsgemeinschaft für Epidemiologie. Freiburg im Breisgau, 12.-15.09.2005. Düsseldorf, Köln: German Medical Science; 2005. Doc05gmds284

The electronic version of this article is the complete one and can be found online at: http://www.egms.de/en/meetings/gmds2005/05gmds287.shtml

Published: September 8, 2005

© 2005 Sauerbrei et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.en). You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Outline

Text

Introduction

Several approaches are used to derive classification rules for the diagnosis of patients. Most of the approaches belong to one of the three classes: regression models, classification trees or neural networks. Within each class a variety of modifications is proposed in the literature, furthermore several approaches to aggregate classifiers have been suggested to improve classification rules. Issues as complexity, (in-)stability and interpretability are discussed controversially, a general assessment of the advantages and disadvantages of specific approaches is difficult.

Subject and methods

Using diagnostic studies we will compare classifiers to differentiate between two groups. We will compare error rates, the Brier score and others for classifiers developed from logistic regression models, classification trees, boosting trees and random forests. We will also discuss issues of interpretability and practical usefulness.

Results

Concerning error rates and other statistical criteria differences between the approaches are small. Trees can be improved by ensemble methods. Concerning interpretability and practical usefulness regression models with strong factors included have several advantages.

Discussion

Ensemble methods can be used to improve classifiers based on trees. However, considering interpretability, transportability and practical usefulness as important criteria, regression models with the strong factors included are still the method of choice.


References

1.
Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees, Wadsworth 1984, CA
2.
Lausen B, Sauerbrei W, Schumacher M. Classification and Regression Trees (CART) used for the exploration of prognostic factors measured on different scales. In:Dirschedl P, Ostermann R, Computational Statistics, Physika-Verlag, Heidelberg 1994; 483-496
3.
Sauerbrei W, Madjar H, Proempeler H. Differentiation of Benign and Malignant Breast Tumors by Logistic Regression and a Classification Tree using Doppler flow signals. Methods of Information in Medicine 1998; 37:226-234
4.
Breiman L. Random Forests. Machine Learning Journal 2001; 45:5-32
5.
Friedman J. Greedy function approximation: a gradient boosting machine. Technical Report, Department of Statistics, Stanford University 1999.