gms | German Medical Science

GMDS 2015: 60. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

06.09. - 09.09.2015, Krefeld

Missing data imputation in clinical studies – ignorable missingness and sensitivity analysis

Meeting Abstract

  • Thomas Kumke - UCB Pharma, Monheim am Rhein, Deutschland
  • Josef Smolen - Medical University of Vienna and Hietzing Hospital, Wien, Österreich
  • Vibeke Strand - Biopharmaceutical Consulting, Portola Valley, USA
  • Irina Mountian - UCB Pharma, Brüssel, Belgium
  • Geert Molenberghs - Interuniversity Institute for Biostatistics and Statistical Bioinformatics Centre, University of Hasselt and Leuven, Hasselt, Belgium

GMDS 2015. 60. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Krefeld, 06.-09.09.2015. Düsseldorf: German Medical Science GMS Publishing House; 2015. DocAbstr. 169

doi: 10.3205/15gmds142, urn:nbn:de:0183-15gmds1421

Published: August 27, 2015

© 2015 Kumke et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at



Introduction: Missing data are a common problem in longitudinal clinical studies since they affect the reliability of the results, reduce their power and introduce bias. The selection of the appropriate imputation method strongly depends on the appropriateness of the statistical assumptions to reduce bias.

The data model consists of the joint probability of measured outcomes and missingness. To decompose this joint probability, a pattern-mixture model (PMM) can be used [1]. If missingness is missing at random (MAR), parameters describing both components are disjoint, and likelihood or Bayesian inferences are used, missingness is ignorable and the missing-data mechanism does then not need to be modeled. If missingness is non-ignorable, missing not at random (MNAR) strategies might be needed to obtain unbiased analysis results.

In clinical trials, very often it is assumed that missingness is ignorable. Frequently, missing completely at random (MCAR) or MAR approaches (ignorable likelihood or Bayesian inferences) [2] are used.

Among other methods, multiple imputation (MI) has become a popular method during the last decade and can be used for both ignorable and non-ignorable missingness [3].

The objectives of this analysis are (i) the development of imputation models for patients with random and non-random discontinuation from two clinical trials in rheumatoid arthritis (RA), (ii) application of these models to two continuous variables that represent the main disease characteristics in RA, and (iii) comparison of imputation results using MAR and MNAR strategies.

Materials and Methods: We used data from a pooled analysis of two randomized, placebo-controlled clinical trials including their open-label extensions (OLE) assessing the efficacy and safety of Certolizumab Pegol in RA. The total trial duration was 256 weeks with assessments at scheduled visits varying between 1 and 4 weeks during the double-blind period and every 12 weeks during OLE. Altogether, 1601 RA patients were included in both trials and 1020 patients discontinued before trial termination due to lack of efficacy, adverse events, and patient decision as the most frequent reasons.

The results of two continuous variables, DAS28(ESR) and HAQ-DI which represent disease activity and functional activity, respectively, will be presented. Missing data imputation was performed post-hoc using MI and two PMMs. A monotone regression model was used for MI of discontinued patients (ie, monotone missing data) and a Markov Chain Monte Carlo approach for intermediate missing values (ie, non-monotone missing data).

PMMs were set up using an MI approach where MI was applied sequentially to each individual pattern [4]. Information on missingness was borrowed from patients who completed their trial. Completer data were modeled using a latent mixture model to extract trajectories [5] that describe patients with potential random (ie, adverse events, other reasons) and non-random discontinuation reasons (ie, lack of efficacy, patient decision) depending on their treatment response. Completer patients with small treatment responses were assumed as potential dropouts due to non-random discontinuation reasons while completer patients with high treatment responses were assumed as potential dropouts due to random discontinuation reasons.

The first model (PMM I) used completer patients with potential non-random discontinuation reason for imputation of each pattern. The second model (PMM II) was a hierarchical model and imputed patients with random and non-random discontinuation reasons with their potential counterparts of the completer patients.

It should be noted that the partitioning of completer patients into potential dropouts with random and non-random discontinuation reasons is based on plausibility only and not on statistical assumptions.

Results: The MI results show overall consistency with the observed cases analysis for both variables. However, when subgrouped by discontinuation reason, the MI vs time curves for patients who discontinued due to lack of efficacy and/or patient decision were outside the 95% confidence interval (CI) of the overall MI vs time curve. The MI vs time curve for patients who discontinued due to adverse events was within the 95% CI of the overall MI vs time curve.

The PMM I imputation vs time curve showed deviations from the MI vs time curve after approximately 100 weeks of treatment for both DAS28(ESR) and HAQ-DI. Results of the hierarchical model (PMM II) were more consistent with the MI vs time curve for both variables.

Discussion: Results of the PMMs indicate that introduction of the particular missingness mechanism as defined above (ie, random/non-random discontinuation) has a distinct effect on imputed values. If a sufficient number of patients discontinued the trial due to lack of efficacy or their own decision, the PMM imputed time curve deviates markedly from the imputed MI time curve. The main conclusion of this modeling approach is that the introduction of a specified missingness mechanism leads to different results compared to a MAR approach. If a considerable number of patients withdrew due to non-random discontinuation reasons (eg, lack of efficacy), a MNAR approach should be considered in addition to a MAR approach for sensitivity analysis.


Molenberghs G, Kenward MG. Missing Data in Clinical Studies. Chichester: John Wiley and Sons; 2007.
Siddiqui O, Hung HMJ, O’Neill R. MMRM vs LOCF: a comprehensive comparison based on simulation study and 25 NDA datasets. J Biopharmaceut Stat. 2009;19:227-246.
Carpenter JR, Kenward MG. Multiple Imputation and its Application. Chichester: John Wiley and Sons; 2013.
Raititch B, O’ Kelly M, Tosiello R. Missing data in clinical trials: from clinical assumptions to statistical analysis using pattern mixture models. Pharmaceut Stat. 2013;12:337-347.
Jones LB, Nagin DS. Advances in group-based trajectory modelling and an SAS procedure for estimating them. Sociol Meth Res. 2007;35:542-571.