gms | German Medical Science

66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

26. - 30.09.2021, online

Using Real World Data to Construct a Synthetic Control Arm – the Prediction Design

Meeting Abstract

Search Medline for

  • Stella Erdmann - Institute of Medical Biometry, University of Heidelberg, Germany, Heidelberg, Germany
  • Dominic Edelmann - Division of Biostatistics, German Cancer Research Center, Heidelberg, Germany, Heidelberg, Germany
  • Meinhard Kieser - Institute of Medical Biometry, University of Heidelberg, Germany, Heidelberg, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 26.-30.09.2021. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 129

doi: 10.3205/21gmds084, urn:nbn:de:0183-21gmds0843

Published: September 24, 2021

© 2021 Erdmann et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: The gold standard for investigating the effectiveness of a new therapy – the (pragmatic) randomized controlled trial (pRCT) [1] – is costly and time-consuming [2]. At the same time, huge amounts of real world data (RWD) in analyzable format are neglected, if not often completely ignored. To overcome this shortcoming, alternative study designs with more efficient data use would be desirable.

Methods: Assume a new intervention is to be tested in a routine care setting and that large amounts of standard care patient data can be assessed (e.g. in form of claims data provided by the health care provider). The idea is now to set up a prediction model using routine care data, which is then applied to estimate the (non-observed) treatment effect of the standard therapy for patients receiving the new intervention, that is, to conduct a single-arm study with a synthetic control arm.

To investigate whether and how such a design – the prediction design – could be used to provide information on treatment effects based on existing infrastructure and data sources, we explored the assumptions under which a linear regression model could be used to predict the counterfactual of patients accurately enough to construct a test for assessing the treatment effect. This investigation was motivated by two (hypothetical) studies, which each explore the effect of a new intervention for type II diabetes patients on the Hb1c reduction. In the first example, the data used for the synthetic control arm is a fixed historical data set and the aim is to compare the effect of liraglutide to sitagliptin. The prediction model of the second example is fitted by claims data, which is collected in parallel to the single arm trial, i.e., the data is not known until analysis and hence considered random. Here, it is assumed that patients participate in a disease management program, where Hb1c reduction and other relevant variables for the prediction model are collected.

Results: Simulations were used to examine the amount of data needed for the control condition as well as for the single-arm study in order to control for (average) type I and type II errors. Depending on the amount of available control condition data and the performance of the prediction model, the sample size could be reduced compared to a conventional pRCT.

Discussion: If historical data is used for the prediction model, control of type I error cannot be formally guarantueed for each possible historical data set. To estimate the amout of “considerable” violation, the proportion of the historical data sets in the simulation for which the type I error is > 0.07 is calculated. Moreover, the average type I error and corresponding 2.5% and 97.5% quantiles are presented (compare [3]).

Conclusion: The proposed approach could be of use in specific applications. However, methodological aspects concerning type I and type II error inflation for the historical data set application and availability of specific covariates for the prediction model, especially in claims data, are challenging.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Dal-Ré R, Janiaud P, Ioannidis JP. Real-world evidence: How pragmatic are randomized controlled trials labeled as pragmatic? BMC medicine. 2018;16(1):1-6.
2.
Patsopoulos NA. A pragmatic view on pragmatic trials. Dialogues in clinical neuroscience. 2011;13(2):217.
3.
Edelmann D, Habermehl C, Schlenk RF, Benner A. Adjusting Simon's optimal two‐stage design for heterogeneous populations based on stratification or using historical controls. Biometrical Journal. 2020;62(2):311-329.