### Article

## Multivariable regression models with continuous covariates

### Search Medline for

### Authors

Published: | September 14, 2004 |
---|

### Outline

### Text

Regression models play a central role in epidemiology and clinical studies. In epidemiology the emphasis is typically either on determining whether a given risk factor affects the outcome of interest (adjusted for confounders), or on estimating a dose/response curve for a given factor, again adjusting for confounders. An important class of clinical studies is the so-called prognostic factors studies, in which the outcome for patients with chronic diseases such as cancer is predicted from various clinical features. In both application areas, it is almost always necessary to build a multivariable model incorporating known or suspected influential variables while eliminating those found to be unimportant.

It is commonplace for risk or prognostic factors to be measured on a continuous scale, an obvious example being a person's age. Conventionally, such factors are either modelled as linear functions or are converted into categories according to some chosen set of cut-points. However, categorisation and use of the resulting estimates is a procedure known to be fraught with difficulty; see for example Altman et al [Ref. 1]. A linear function may fit the data badly and give misleading estimates of risk. Therefore, reliable approaches for representing the effects of continuous factors in multivariable models are urgently needed.

Building multivariable regression models by selecting influential covariates and determining the functional form of the relationship between a continuous covariate and the outcome when analysing data from clinical and epidemiological studies is the main concern of this lecture. Systematic procedures which combine selection of influential variables with determination of functional form for continuous factors are rare. Analysts may apply their individual subjective preferences for each part of the model-building process, estimate parameters for several models and then decide on the final strategy according to the results they find. By contrast, we will present here the multivariable fractional polynomial (MFP) approach as a systematic way to determine a multivariable regression model. Major concerns will be discussed, including robustness and possible model instability. Regarding determination of the functional form, we will also discuss some alternatives with more emphasis on local estimation of the function (e.g. splines). The MFP procedure may be used for various types of regression models (linear regression model, logistic model, Cox model, etc). A clinical and an epidemiological example will be used to illustrate and compare the approaches. Software for R, Stata and SAS is generally available.

### References

- 1.
- Altman DG, Lausen B, Sauerbrei W, Schumacher M. 1994. The dangers of using `optimal' cutpoints in the evaluation of prognostic factors. Journal of the National Cancer Institute 86: 829-835
- 2.
- Royston P, Altman DG. 1994. Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling (with Discussion). Applied Statistics 43: 429-467
- 3.
- Royston P, Ambler G, Sauerbrei W. 1999. The use of fractional polynomials to model continuous risk variables in epidemiology. International Journal of Epidemiology, 28: 964-974
- 4.
- Royston P, Sauerbrei W. 2003. Stability of multivariable fractional polynomial models with selection of variables and transformations: a bootstrap investigation. Statistics in Medicine 22: 639-659
- 5.
- Royston P, Sauerbrei W. 2004. Improving the robustness of fractional polynomial models by preliminary covariate transformation. Statistical Modelling, submitted
- 6.
- Sauerbrei W, Royston P. 1999. Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. Journal of the Royal Statistical Society (A) 162: 71-94. Corrigendum: JRSS (A) 165: 399-400 (2002)