gms | German Medical Science

49. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (gmds)
19. Jahrestagung der Schweizerischen Gesellschaft für Medizinische Informatik (SGMI)
Jahrestagung 2004 des Arbeitskreises Medizinische Informatik (ÖAKMI)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie
Schweizerische Gesellschaft für Medizinische Informatik (SGMI)

26. bis 30.09.2004, Innsbruck/Tirol

Pseudo R-squared measures for generalised linear regression models

Meeting Abstract (gmds2004)

Search Medline for

  • corresponding author presenting/speaker Martina Mittlböck - Medical University of Vienna, Vienna, Österreich
  • Harald Heinzl - Medical University of Vienna, Vienna, Österreich

Kooperative Versorgung - Vernetzte Forschung - Ubiquitäre Information. 49. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (gmds), 19. Jahrestagung der Schweizerischen Gesellschaft für Medizinische Informatik (SGMI) und Jahrestagung 2004 des Arbeitskreises Medizinische Informatik (ÖAKMI) der Österreichischen Computer Gesellschaft (OCG) und der Österreichischen Gesellschaft für Biomedizinische Technik (ÖGBMT). Innsbruck, 26.-30.09.2004. Düsseldorf, Köln: German Medical Science; 2004. Doc04gmds074

The electronic version of this article is the complete one and can be found online at: http://www.egms.de/en/meetings/gmds2004/04gmds074.shtml

Published: September 14, 2004

© 2004 Mittlböck et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.en). You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Outline

Text

A regression model provides a rather simple image of a situation, which is intrinsically complex in general. The importance of potential prognostic factors is assessed in order to enable an improved prognosis of the interesting outcome variable. The closer the model is to reality, the more variability of the outcome variable can be explained. In a common linear regression model with normally distributed outcome the fraction of the explained variability is quantified by the coefficient of determination, also called R-squared measure. It provides additional information besides the parameter estimates, p-values and confidence intervals of the covariates. In the ideal case of a perfect prognosis R-squared would achieve one. On the other hand, if nothing at all can be explained by the model, the lower bound of zero is attained.

As R-squared is well-known and commonly used in the linear regression model, attempts have been made to define pseudo R-squared values for generalised linear models as well [1], [2], [3], [4], although the generalisation to non-normal data is not straightforward. Recommended pseudo R-squared measures are either based on the concept of deviance or on sums-of-squares, which may result in different estimates for the same data. Advantages and disadvantages of different approaches are compared and discussed.

Regression models are often used to screen for prognostic factors, even in situations where the sample size is rather small compared to the number of covariates. By definition, an R-squared measure increases monotonically if covariates are added to the model even if they are not correlated with the interesting outcome. That is, unadjusted R-squared measures may be substantially inflated, jeopardizing the ability to draw valid interpretations. R-squared values of 30 percent or higher can easily be reached, even when no association between independent and dependent variables exists at all. The use of bias-adjusted R-squared measures, which consider also the number of parameters fitted, is well established in linear regression models. For generalised linear models a shrinkage-based adjustment of the deviance-based pseudo R-squared measure is proposed, so that the expectation of the adjusted pseudo R-squared measure corresponds to the underlying population value [3], [4], [5], [6]. Furthermore we show that the resulting adjustment coincides with the adjusted R-squared measure in linear regression. The adjustment can also be generalised to the case of over- and under-dispersed Poisson regression models [7].

In summary, correctly adjusted R-squared values give essential information additional to the usual modelling results since they allow to quantify the current knowledge (or nescience) about the interesting outcome variable.


References

1.
Mittlböck M, Schemper M. Explained variation for logistic regression. Statistics in Medicine 1996; 15: 1987-1997.
2.
Mittlböck M, Schemper M. Computing measures of explained variation for logistic regression models. Computer Methods and Programs in Biomedicine 1999; 58: 17-24.
3.
Mittlböck M, Heinzl H. Measures of explained variation in Gamma regression model. Communications in Statistics - Simulation and Computation 2002; 31: 61-73.
4.
Heinzl H, Mittlböck M. R-squared measures for the inverse Gaussian regression model. Computational Statistics 2002: 17: 525-544.
5.
Mittlböck M, Waldhör T. Adjustments for R²-Measures for Poisson regression models. Computational Statistics & Data Analysis 2000; 34: 461-472.
6.
Mittlböck M. Calculating Adjusted R² Measures for Poisson Regression Models. Computer Methods and Programs in Biomedicine 2002; 68: 205-214.
7.
Heinzl H, Mittlböck M. Pseudo R-squared measures for Poisson regression models with over- or under-dispersion. Computational Statistics and Data Analysis 2003; 44: 253-271