Article
Better tools for better estimates: Improving approaches to handling missing data in Swiss Cancer registries
Search Medline for
Authors
Published: | September 15, 2023 |
---|
Outline
Text
Introduction: The vital status of an individual is essential for a number of estimates in cancer research. Excluding cases with this information being missing can lead to invalid estimates and introduce bias, but various approaches have been used in the literature to handle the missingness. We aimed to compare these approaches to determine which, if any, led to the least biased estimates in typical analytic tasks of cancer registries.
Methods: A simulation study was performed using data from the Swiss National Institute of Cancer Epidemiology and Registration (NICER) for the following tumor types, with n referring to the number of cases: breast cancer (n = 109004), prostate cancer (n = 108045), colorectal cancer (n = 77235), pancreatic cancer (n = 20397), laryngeal cancer (n = 5461) and lung, bronchus and trachea cancer (n = 73080). First, 5%, 10% and 15% missingness in the vital status was introduced artificially in the complete data. Second, missing data was simulated by applying proposed data generating mechanisms. Suggested data generating mechanisms were versions of no imputation, single imputation and multiple imputation approaches. No imputation indicated the data was left as is. Single imputation indicated that single values for follow-up times were simulated using either Swiss or Dutch cancer registry data. For the multiple imputation approaches, missing values were simulated 20 times on the basis of other relevant covariates (e.g. age and language region), the analysis was performed, and their results were pooled to give final estimates. Third, 5-year overall survival estimates were computed. Lastly, simulated results were compared with the true value based on performance measures, such as bias. Bias was computed as the difference between the simulated and true value. This procedure was repeated 1650 times per combination of tumor type and missingness proportion.
Results: 5-year overall survival estimates obtained from multiple imputation approaches yielded results closest to the true value. Single imputation approaches generally showed higher average bias than using no imputation at all. The largest average bias of all simulations in overall survival estimates was observed using single imputation, and was approximately 16% in all tumor types. Naive approaches led to an average bias ranging from 1% in pancreatic cancer to 13% in lung cancer. However, multiple imputation approaches tended to have the widest confidence intervals. No major differences were observed when comparing data generating mechanisms between tumor groups, but bias was higher when the proportion of missing vital status data was larger.
Conclusion: This simulation study indicated that often used single imputation techniques to fill in missing vital status data are likely too biased to be useful in practice. Multiple imputation approaches yielded overall survival estimates with the least bias. In additional simulation studies, statistical methods for handling missing vital status data when estimating relative survival or standardized incidence ratio will be evaluated.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.