Artikel
Analysing propensity score methods on two randomized controlled clinical trials
Suche in Medline nach
Autoren
Veröffentlicht: | 26. Februar 2021 |
---|
Gliederung
Text
Background: Propensity score methods has been widely used in observational studies and growingly seen as a solution for evaluating treatment effects in non-randomized controlled trials using external controls. They account only for observable confounder via analysis. In contrast to that randomized trials account for observable and unobservable confounder via design. Nowadays, single arm studies tend to prevail in oncology, especially in assessing proof-of-principle or in highly unmet need indications. Propensity score methods offer an opportunity to get maximum patients exposed to a promising treatment and potentially speed development. However, we will investigate potential limitations.
Objectives: Our research question is: How do propensity score methods perform in an ideal setting, i.e. with a control group where confounding is not a concern under real data and different sample sizes?
Methods: To investigate propensity score methods under this ideal setting, we compare the treatment groups of two randomized controlled clinical trials with time-to-event endpoint. Here, it is expected that the experimental and control group are drawn from the same population and that all confounders are equally distributed between treatment arms. If propensity score methods do not perform well in this setting, the quality of its use in non-randomized settings with data from potentially different populations might be questionable.
The treatment effect that should be reproduced is measured by a hazard ratio. The samples sizes were planned to reach 90% power with a one-sided significance level of 2.5%.
Using a pool of available baseline covariates, we select the best model for the time-to-event endpoint via AIC criterion. The covariates which form this best model are included in a logistic regression model to estimate the propensity scores.
These following propensity score methods are evaluated: matching, weighting (inverse probability of treatment weighting) and stratification. Hazard ratios from propensity score analyses are then compared to those received from standard Cox modelling results.
To examine the quality of matching we use standardized mean differences as well as histograms of propensity score distributions before and after matching.
To investigate the applicability for studies with small sample size, we apply all methods with only 20% of the experimental arm subjects from the randomized trial as well. This leads to 5 random subsets, each is matched with the complete control arm from the randomized trial as external control.
Results: Results of the propensity score methods were comparable to standard regression results under randomized treatment allocation only in one of the two clinical trial examples. There were no substantial differences in the investigated propensity score methods. We also saw that propensity score methods failed under strong differences between treated and control group in propensity score distributions. We observed high variability in the hazard ratio estimators under small sample sizes as well.
Conclusion: If external controls with propensity score methods are considered in drug development, it is essential to have a careful evaluation in planning phase (sample size, in- and exclusion criteria, method evaluation ...).
Heiko Götte and Daliah Dieckmann are employees of Merck Healthcare KGaA.
The authors declare that an ethics committee vote is not required.