### Article

## Late entry in Cox regression analysis with time varying covariates

### Search Medline for

### Authors

Published: | September 14, 2004 |
---|

### Outline

### Text

#### Introduction

Korn et al. [Ref. 1], among others, recommend to use age as time axis for Cox regression analysis of epidemiological studies rather than study time as is the common choice in survival analysis of clinical trials. This approach initially requires that birth is the individual zero point on the time axis. Without taking further steps, in regression analysis, each person is processed as if the individual observation time window starts at birth. However, in most cohort studies, especially occupational cohort studies, this is not the case. Due to late entry, the survival times are left truncated.

The issue of late entry seems to be undervalued in the epidemiological literature. We therefore conducted a simulation study to assess the effect of ignoring late entry in survival analysis with age as time scale. For the purpose of comparison we also simulated the more traditional approach with time on study as time axis. We concentrate here on the situation with time varying covariates.

#### Methods

The 'time till event' of each person was simulated iteratively in time intervals of 1 year (see equation 2). The influence of age was simulated by a Gompertz survival time model [Ref. 2]. Additional explanatory variables were time of employment (dynamically allocated) in three categories used as the measure of exposure and a dichotomous time independent covariable. During the iterative simulation process time points of employment and unemployment were set by means of a Bernoulli distributed dichotomous random variable, respectively. The survival distribution function of the simulated data is given by:

(1) *
*

*where *

- t denotes age

- **x**
**
_{1}
**

**(t), x**

_{2}**(t)**represent time dependent dichotomous covariables for exposure category 2 and 3

- **z** represents a dichotomous time independent covariable

- **τ** and **δ** are the parameters of the Gompertz distribution defining the effect of age

The event probability relevant for a distinct time interval was calculated as the difference between the survival probabilities at the time interval limits:

(2) *
*

The considered situation for the simulation was varied by changing the parameters of the effect of the time dependent exposure (HR 0.45 to HR 4.95), by altering the correlations between age and exposure (ρ≈ 0.37 and ρ≈ 0.64, respectively), and by using different sample sizes. For each set of parameters 5000 datasets were simulated.

The Cox regression model, using age as time axis, is given by:

(3) *
*

*where *the same covariables are used as in the simulation model.

The corresponding Cox regression model, using study time as time axis, is given by:

(4) *
*

*where *an additional time independent covariable for age at study entry is introduced.

Different types of the analyses were performed:

**A** total cohort, follow up beginning at birth, no censoring (= best possible situation for estimation)

**Bza1** occupational cohort, censored, age as time axis, no 'left truncation'

**Bza2** occupational cohort, censored, age as time axis, with 'left truncation'

**Bzs1** occupational cohort, censored, study time as time axis, with cofactor 'age at study entry'

For each type of analysis and each set of parameters the bias as the mean difference between parameter estimate and true parameter value were calculated for the different covariables of the Cox model.

#### Results and Discussion

Different factors influence the bias of partial likelihood estimates in Cox regression analysis. The influences of the factors are varying in amount and direction and interact with each other.

In our simulations with sample sizes of 400 and without censoring the partial likelihood estimation itself leads to an underestimation of the regression coefficient (bias directed towards zero), whereas it is known that in small samples a bias directed away from zero is usually observed [Ref. 3], [Ref. 4].

It is not possible yet to give a general rule concerning height and direction of the bias, e.g. which influence is dominant in a specific situation. In the case of censored data a correlation between age and time dependent exposure leads to a positive shift of the estimates for exposure which is enhanced if the event number is low. The use of either age or study time as the time axis in Cox regression shows only marginal differences in parameter estimates (case Bza2 and Bzs1 in [Fig. 1]).

When using age as time axis ignoring the left truncation (case Bza1 in [Fig. 1]) can dramatically increase the bias especially for estimates of small effects. However, the influencing factors do not lead to a changed direction of the estimates when compared to the true parameter value.

In conclusion, when age is used as time axis in Cox regression analyses, it is required to take left truncated survival times into account.

### References

- 1.
- Korn LA, Graubard BI, Midthune D. Time-to-event analysis of longitudinal follow-up of a survey: choice of the time scale. Am J Epidemiol 1997; 145: 72-80.
- 2.
- Bender R, Augustin T, Blettner M. Generating survival times to simulate Cox proportional hazards models. Stat. Med. (in revision).
- 3.
- Firth D. Bias reduction of maximum likelihood estimates. Biometrika 1993; 89: 27-38.
- 4.
- Langner I, Lenz-Tönjes R, Bender R, Blettner M. Bias von Maximum-Likelihood-Schätzern: ein Vergleich der logistischen Regression und der Cox-Regression. Abstract-Band zum 49. Biometrischen Kolloquium (Herausgeber: International Biometric Society) 2003; 48