Artikel
Development of classifiers for diagnosis - regression models, trees or tree ensembles?
Suche in Medline nach
Autoren
Veröffentlicht: | 8. September 2005 |
---|
Gliederung
Text
Introduction
Several approaches are used to derive classification rules for the diagnosis of patients. Most of the approaches belong to one of the three classes: regression models, classification trees or neural networks. Within each class a variety of modifications is proposed in the literature, furthermore several approaches to aggregate classifiers have been suggested to improve classification rules. Issues as complexity, (in-)stability and interpretability are discussed controversially, a general assessment of the advantages and disadvantages of specific approaches is difficult.
Subject and methods
Using diagnostic studies we will compare classifiers to differentiate between two groups. We will compare error rates, the Brier score and others for classifiers developed from logistic regression models, classification trees, boosting trees and random forests. We will also discuss issues of interpretability and practical usefulness.
Results
Concerning error rates and other statistical criteria differences between the approaches are small. Trees can be improved by ensemble methods. Concerning interpretability and practical usefulness regression models with strong factors included have several advantages.
Discussion
Ensemble methods can be used to improve classifiers based on trees. However, considering interpretability, transportability and practical usefulness as important criteria, regression models with the strong factors included are still the method of choice.
References
- 1.
- Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and Regression Trees, Wadsworth 1984, CA
- 2.
- Lausen B, Sauerbrei W, Schumacher M. Classification and Regression Trees (CART) used for the exploration of prognostic factors measured on different scales. In:Dirschedl P, Ostermann R, Computational Statistics, Physika-Verlag, Heidelberg 1994; 483-496
- 3.
- Sauerbrei W, Madjar H, Proempeler H. Differentiation of Benign and Malignant Breast Tumors by Logistic Regression and a Classification Tree using Doppler flow signals. Methods of Information in Medicine 1998; 37:226-234
- 4.
- Breiman L. Random Forests. Machine Learning Journal 2001; 45:5-32
- 5.
- Friedman J. Greedy function approximation: a gradient boosting machine. Technical Report, Department of Statistics, Stanford University 1999.