gms | German Medical Science

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH)

08.09. - 13.09.2024, Dresden

Covariate selection in external validity – a classification of variables to (not) include

Meeting Abstract

  • Fabian Manke-Reimers - Center for Preventive Medicine and Digital Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
  • Vincent Brugger - Center for Preventive Medicine and Digital Health, Medical Faculty Mannheim, Heidelberg University, Mannheim, Germany
  • Michael Webster-Clark - Department of Epidemiology and Biostatistics, McGill University, Montreal, Canada; Department of Epidemiology, University of North Carolina at Chapel Hill, Chapel Hill, United States

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH). Dresden, 08.-13.09.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 951

doi: 10.3205/24gmds323, urn:nbn:de:0183-24gmds3230

Published: September 6, 2024

© 2024 Manke-Reimers et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: It is well known that shifted (differently distributed) effect measure modifiers (EMMs) are a potential cause of unequal average treatment effects in different populations and might therefore be a threat to the external validity of a study when one aims to infer to a specific target population not coinciding with the study sample [1]. Such a problem is framed in the literature as selection bias, a generalizability issue or a transportability issue [1]. In any case, the selection of covarites is crucial to identify the effect. However, the recommendations for covariate selection in the literature are conflicting. For instance, one recommendation is to include as many EMMs as possible, while another is to only include shifted EMMs. These recommendations are not in line with the theoretical literature which focuses on the causal relationships to select covariates [2]. We aim to bridge the gap between practical recommendations and theoretical necessity by outlining types of EMMs to (not) include.

Methods: To characterize different types of EMMs and their relation to the selection variable S (indicating the sample membership of source or target sample), we use directed acyclic graphs (DAGs), the four prototypical types of EMMs (direct, indirect, by proxy, by common cause) from VanderWeele and Robins [3] and a fifth type we refer to as an EMM by common effect. More precisely, we inspect how sufficient covariate sets for an unbiased estimation in the target sample differ depending on the types of EMMs and the direction of their relation to the selection variable (so S→EMM and EMM→S). Further, we simulate data to underpin the theoretical results by using estimators developed specifically for external validity evaluations [4], [5].

Results: When EMM→S, it is necessary to include all four prototypical types of EMMs in a sufficient set when they are shifted. However, when S→EMM, conditioning on shifted EMMs by proxy or common cause induces collider bias because the selection variable then takes the shape of an EMM by common effect. Hence, only EMMs that potentially interact with the treatment/exposure on the outcome should be included in the analysis when they are shifted. Non-shifted EMMs can be included in the analysis for EMM→S and S→EMM when they potentially interact with the treatment/exposure on the outcome. In contrast, non-shifted EMMs without the potential for interaction might induce collider bias due to an M-bias type structure. It follows that the sufficiency of a covariate set might depend on the direction of the selection variable and the types of EMMs. When a covariate set is only sufficient for EMM→S or S→EMM, we refer to this set as s-dependent.

Conclusion: Following current recommendations on covariate selection might bias an external validity analysis. For instance, including as many EMMs as possible or all shifted EMMs biases an estimation when parts of the covariate set are EMMs by proxy or common cause. Thus, researchers should focus on the causal relationships of the EMMs before conducting an external validity analysis, possibly by following the principles of covariate selection presented in this work.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Degitar I, Rose S. A review of generalizability and transportability. Annu Rev Stat Appl. 2023;10(1):501-24.
2.
Bareinboim E, Pearl J. Causal inference and the data-fusion problem. Proc Natl Acad Sci USA. 2016;113(27):7345-52.
3.
VanderWeele TJ, Robins JM. Four types of effect modification: A classification based on directed acyclic graphs. Epidemiology. 2007;18(5):561-8.
4.
Dahabreh IJ, Robertson SE, Steingrimsson JA, Stuart EA, Hernán MA. Extending inferences from a randomized trial to a new target population. Stat Med. 2020;39(14):1999-2014.
5.
Dahabreh IJ, Robertson SE, Tchetgen EJ, Stuart EA, Hernán MA. Generalizing causal inferences from individuals in randomized trials to all trial-eligible individuals. Biometrics. 2019;75(2):685-94.