gms | German Medical Science

65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS)

06.09. - 09.09.2020, Berlin (online conference)

Analysing propensity score methods on two randomized controlled clinical trials

Meeting Abstract

Search Medline for

  • Daliah Dieckmann - Merck Healthcare KGaA, Darmstadt, GermanyHochschule Darmstadt – University of Applied Science, Darmstadt, Germany
  • Heiko Götte - Merck Healthcare KGaA, Darmstadt, Germany
  • Antje Jahn - Hochschule Darmstadt – University of Applied Sciences, fbmn Fachbereich Mathematik und Naturwissenschaften, Darmstadt, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS). Berlin, 06.-09.09.2020. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 372

doi: 10.3205/20gmds319, urn:nbn:de:0183-20gmds3195

Published: February 26, 2021

© 2021 Dieckmann et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Background: Propensity score methods has been widely used in observational studies and growingly seen as a solution for evaluating treatment effects in non-randomized controlled trials using external controls. They account only for observable confounder via analysis. In contrast to that randomized trials account for observable and unobservable confounder via design. Nowadays, single arm studies tend to prevail in oncology, especially in assessing proof-of-principle or in highly unmet need indications. Propensity score methods offer an opportunity to get maximum patients exposed to a promising treatment and potentially speed development. However, we will investigate potential limitations.

Objectives: Our research question is: How do propensity score methods perform in an ideal setting, i.e. with a control group where confounding is not a concern under real data and different sample sizes?

Methods: To investigate propensity score methods under this ideal setting, we compare the treatment groups of two randomized controlled clinical trials with time-to-event endpoint. Here, it is expected that the experimental and control group are drawn from the same population and that all confounders are equally distributed between treatment arms. If propensity score methods do not perform well in this setting, the quality of its use in non-randomized settings with data from potentially different populations might be questionable.

The treatment effect that should be reproduced is measured by a hazard ratio. The samples sizes were planned to reach 90% power with a one-sided significance level of 2.5%.

Using a pool of available baseline covariates, we select the best model for the time-to-event endpoint via AIC criterion. The covariates which form this best model are included in a logistic regression model to estimate the propensity scores.

These following propensity score methods are evaluated: matching, weighting (inverse probability of treatment weighting) and stratification. Hazard ratios from propensity score analyses are then compared to those received from standard Cox modelling results.

To examine the quality of matching we use standardized mean differences as well as histograms of propensity score distributions before and after matching.

To investigate the applicability for studies with small sample size, we apply all methods with only 20% of the experimental arm subjects from the randomized trial as well. This leads to 5 random subsets, each is matched with the complete control arm from the randomized trial as external control.

Results: Results of the propensity score methods were comparable to standard regression results under randomized treatment allocation only in one of the two clinical trial examples. There were no substantial differences in the investigated propensity score methods. We also saw that propensity score methods failed under strong differences between treated and control group in propensity score distributions. We observed high variability in the hazard ratio estimators under small sample sizes as well.

Conclusion: If external controls with propensity score methods are considered in drug development, it is essential to have a careful evaluation in planning phase (sample size, in- and exclusion criteria, method evaluation ...).

Heiko Götte and Daliah Dieckmann are employees of Merck Healthcare KGaA.

The authors declare that an ethics committee vote is not required.