gms | German Medical Science

68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

17.09. - 21.09.23, Heilbronn

Better tools for better estimates: Improving approaches to handling missing data in Swiss Cancer registries

Meeting Abstract

  • Cornelia Richter - University of Zurich, Epidemiology, Biostatistics and Prevention Institute, Zurich, Switzerland
  • Lea Wildeisen - National Institute for Cancer Epidemiology and Registration, Zurich, Switzerland; University of Zurich, Epidemiology, Biostatistics and Prevention Institute, Zurich, Switzerland
  • Sabine Rohrmann - Cancer Registry of the Cantons of Zurich, Zug, Schaffhausen and Schwyz, Zurich, Switzerland; University of Zurich, Epidemiology, Biostatistics and Prevention Institute, Zurich, Switzerland
  • Sarah Haile - University of Zurich, Epidemiology, Biostatistics and Prevention Institute, Zurich, Switzerland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS). Heilbronn, 17.-21.09.2023. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocAbstr. 146

doi: 10.3205/23gmds084, urn:nbn:de:0183-23gmds0845

Published: September 15, 2023

© 2023 Richter et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: The vital status of an individual is essential for a number of estimates in cancer research. Excluding cases with this information being missing can lead to invalid estimates and introduce bias, but various approaches have been used in the literature to handle the missingness. We aimed to compare these approaches to determine which, if any, led to the least biased estimates in typical analytic tasks of cancer registries.

Methods: A simulation study was performed using data from the Swiss National Institute of Cancer Epidemiology and Registration (NICER) for the following tumor types, with n referring to the number of cases: breast cancer (n = 109004), prostate cancer (n = 108045), colorectal cancer (n = 77235), pancreatic cancer (n = 20397), laryngeal cancer (n = 5461) and lung, bronchus and trachea cancer (n = 73080). First, 5%, 10% and 15% missingness in the vital status was introduced artificially in the complete data. Second, missing data was simulated by applying proposed data generating mechanisms. Suggested data generating mechanisms were versions of no imputation, single imputation and multiple imputation approaches. No imputation indicated the data was left as is. Single imputation indicated that single values for follow-up times were simulated using either Swiss or Dutch cancer registry data. For the multiple imputation approaches, missing values were simulated 20 times on the basis of other relevant covariates (e.g. age and language region), the analysis was performed, and their results were pooled to give final estimates. Third, 5-year overall survival estimates were computed. Lastly, simulated results were compared with the true value based on performance measures, such as bias. Bias was computed as the difference between the simulated and true value. This procedure was repeated 1650 times per combination of tumor type and missingness proportion.

Results: 5-year overall survival estimates obtained from multiple imputation approaches yielded results closest to the true value. Single imputation approaches generally showed higher average bias than using no imputation at all. The largest average bias of all simulations in overall survival estimates was observed using single imputation, and was approximately 16% in all tumor types. Naive approaches led to an average bias ranging from 1% in pancreatic cancer to 13% in lung cancer. However, multiple imputation approaches tended to have the widest confidence intervals. No major differences were observed when comparing data generating mechanisms between tumor groups, but bias was higher when the proportion of missing vital status data was larger.

Conclusion: This simulation study indicated that often used single imputation techniques to fill in missing vital status data are likely too biased to be useful in practice. Multiple imputation approaches yielded overall survival estimates with the least bias. In additional simulation studies, statistical methods for handling missing vital status data when estimating relative survival or standardized incidence ratio will be evaluated.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.