### Article

## A relative survival model for clustered responses

### Search Medline for

### Authors

Published: | September 8, 2005 |
---|

### Outline

### Text

#### Introduction

Relative Survival is the ratio of the overall survival of a group of patients to the expected survival for a demographically similar group from a reference population, where this expected survival is derived from published age-, sex-, and calendar-time-specific mortality rates [Ref. 1]. It is commonly used to estimate the effect of a particular disease when the true cause of death is not reliably known and is therefore the preferred analysis for survival experience in cancer registries, thus avoiding the problem of inaccurate or non-available death certificates [Ref. 2]. The relative survival approach has several attractive features, for example, it allows for claiming cure (in a statistical, not a clinical sense) in the case where the relative survival in a group of patients equals the expected survival in the population [Ref. 3]. Moreover, comparisons between international registries are facilitated because by using relative survival the survival experience in different countries is adjusted for the respective underlying population [Ref. 3].

Generalizing the pure description of relative survival, regression models for relative survival have been proposed [Ref. 2], [Ref. 4] to judge influence of prognostic or risk factors on relative survival. Owing to the principle of relative survival and the relation between survival distribution and hazard function, all regression models for relative survival are necessarily additive hazard regression models. Dickman et al. [Ref. 5] have further shown that the relative survival model of Estève [Ref. 2] can also be interpreted and fitted as a Generalized Linear Model with a Poisson response, an offset and a specific link function which is different for each observation. Up to now and to our knowledge there has been no generalization of the current relative survival regression models to allow for clustered responses.

#### Material and Methods

The data which motivated our work is from the HALLUCA-(= Halle Lung Carcinoma)-study, an observational study which investigated provision of medical care of lung cancer patients in the region of Halle, in the eastern part of Germany. In close cooperation with the regional clinical tumour registries all lung cancer patients in the study region were recorded in a standardized way from April 1996 to September 1999, follow-up was until September 2000. A total of 1696 lung cancer patients was recorded, survival was defined as the time from clinical, histological or cytological diagnosis to death or to censoring. 1349 of the 1696 patients (79.5%) died until the end of follow-up, median survival in the study population was 284 days (= 9.3 months). Data on population mortality were achieved from the Federal Statistical Office of the State of Saxony-Anhalt, from which the study region is a part of. To judge influence of prognostic and risk factors on relative survival, five fixed effects covariates were investigated, all of them categorical. To avoid problems with inflated p-values through model selection procedures, all covariates were chosen and categorized a priori of analysis and independently of the response values.

Already at the stage of pure descriptive analysis of the HALLUCA-study we noticed a very heterogeneous survival experience in the 55 different diagnosing units in our study region. For example, considering only the units which had 5 or more observations, we observe a median survival of 995 days (N=24) in the unit with the longest observed survival time, and, on the opposite, a unit where only 16 DCO(=Death Certificate Only) cases were observed, so that the median survival time in this unit is 0 days.

To account for this heterogeneity in survival between clusters in the relative survival analysis we extended the Dickman’s Generalized Linear Model with Poisson response, offset and the specific link function to a Generalized Linear Mixed Model (GLMM) by adding a normally distributed random cluster effect to the linear predictor. To additionally allow for potential overdispersion in the response we also extended the Negative Binomial Mixed Model of Booth et al. [Ref. 6] to our relative survival case. Parameter were estimated by numerical integration (SAS PROC NLMIXED) and MCMC (WinBUGS), for model comparison we used the Bayes Information Criterion (BIC).

#### Results

Comparing the results from the different models it can be seen that the parameter estimates for the covariates are different, but do not lead to different conclusions in subject matter terms. We see a certain gender effect, where females have lower death hazards, and an age effect, where we have larger hazards for patients older than 65 years. We also find enlarged hazards for patients with SCLC tumours and in higher ECOG classes. There is also a very strong, strictly increasing hazard through the various tumour stages. As expected, the parameter estimates for the covariates are further away from the null and have slightly larger standard errors in the random effects models.

More interesting in our context, however, is to compare the different models with respect to the cluster effect and the additional overdispersion. Comparing the standard relative survival model to the random effects model, we see a clear fall in the BIC values and thus the random effects model is preferred. The BIC for the model with additional overdispersion is larger and so we do not have to account for overdispersion.

#### Discussion

By using the equivalence of the Estève relative survival model and a Poisson GLM with a specific link function we showed how relative survival models for clustered responses can easily be defined, estimated and interpreted. We learned for our data set that adjusting for the heterogeneous survival experience indeed improved the model fit in terms of the BIC. Parameters for the covariates were estimated to be further away from the null and had larger standard errors in the random effects models. Several extensions of the model are possible. For example, we could include more hierarchy levels in the analysis, if this is necessary for the data set. Second, we might also not be satisfied with the assumption of an implicit exponential distribution for survival time and use a more flexible distribution (e.g., the Weibull) for this task. Last, we could generalize the random effects distribution from the normal and maybe use a non-parametric ML approach where the random effects distribution is also estimated.

### References

- 1.
- Buckley JD. Additive and multiplicative models for relative survival rates. Biometrics 1984; 40: 51-62.
- 2.
- Estève J, Benhamou E, Croasdale M, Raymond L. Relative survival and the estimation of net survival: Elements for further discussion. Stat Med 1990; 9:529-538.
- 3.
- Estève J, Benhamou E, Raymond L. Statistical Methods in Cancer Research, Volume IV, Descriptive Epidemiology. Lyon: International Agency for Research on Cancer; 1994.
- 4.
- Hakulinen T, Tenkanen L. Regression analysis of relative survival rates. Appl Stat 1987: 36: 309-317.
- 5.
- Dickman PW, Sloggett A, Hills M, Hakulinen T. Regression Models for Relative Survival. Stat Med 2004; 23:51-64.
- 6.
- Booth JG, Casella G, Friedl H, Hobert JP. Negative Binomial Loglinear Mixed Models. Stat Modelling 2003; 3: 179-91.