gms | German Medical Science

67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

21.08. - 25.08.2022, online

Balancing data of observational studies by adjustment of the endpoint

Meeting Abstract

Suche in Medline nach

  • Marcus Oswald - Institute of Infectious Diseases and Infection Control, University Hospital Jena, Jena, Germany
  • Mathias Pletz - Institute of Infectious Diseases and Infection Control, University Hospital Jena, Jena, Germany; CAPNETZ Stiftung, Hannover, Germany
  • Rainer König - Institute of Infectious Diseases and Infection Control, University Hospital Jena, Jena, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 21.-25.08.2022. Düsseldorf: German Medical Science GMS Publishing House; 2022. DocAbstr. 188

doi: 10.3205/22gmds111, urn:nbn:de:0183-22gmds1111

Veröffentlicht: 19. August 2022

© 2022 Oswald et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: In observational data the treatment decisions within the data are subjective and therefore contain a selection BIAS. The methods of choice to analyze treatment effects in observational data are mostly based on matchings [1] or on the computation of propensity score weights (e.g. IPTW [2], entropy balancing [3]). The matchings/weights are constructed such that the treatment and the control group are balanced. Once they are calculated one can compare the treatment group with the control group by computing the weighted means of given endpoints. One disadvantage of these methods is that each patient is attributed to an individual weight and therefore some patients have higher influence on the analysis than others. This is particularly a problem if the analysis is combined with machine learning approaches.

Method: Here we present a new balancing approach that leaves the patients‘ weights untouched, but the endpoint is modified. For each patient a “balanced endpoint“ is estimated refering to the question, how the endpoint would have changed if the patient would have behaved like the “mean patient in the cohort“. This definition makes all patients comparable and one can compare two groups directly by computing the (ordinary) mean of their balanced endpoint. Given a certain endpoint definition and a certain splitting of the cohort into two parts we compute the mean balanced endpoint within the two parts. Herefore we use entropy balancing since this method works very precisely. This step is repeated for a huge number of random splittings. The number of repetitions should be at least the number of patients. We usually use 2-10 times more repetitions than patients. We end up with a table that contains the mean balanced endpoints for a lot of random patient subsets and the subset-patient incidence matrix. In a second step we use this table to derive balanced endpoints even for individual patients by solving a “reverse engineering“ problem.

Results/discussion: We have done several analyses on hospitalized patients with community aquired pneumonia (CAP) from the observational, prospective, multinational CAPNETZ study [4]. Using this new balancing approach the comparison of two subroups can be simplified by comparing their mean balanced endpoint. Whenever the subgroups are large enough to use other balancing methods, the results are very close to those based on our new method. To summarize, we end with some pros and cons of the new method compared to state-of-the-art methods for causal effects:

Pros:

  • Smaller subgroups can be compared (in principle even single patients).
  • Since all patient weights are equal, p-values can be computed easier.
  • Using ML is much easier since no patients have more influence than others and since subgroups of interest are often not known a priori.

Cons:

  • The estimation of the balanced endpoints needs a lot of computational effort.
  • This computation has to be repeated if the definition of the endpoint changes.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Ho DE, Imai K, King G, Stuart EA. Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference. Political Analysis. 2007;15(3):199–236.
2.
Hernán MA, Robins JM. Estimating causal effects from epidemiological data. J Epidemiol Community Health. 2006;60(7):578-586. DOI: 10.1136/jech.2004.029496 Externer Link
3.
Hainmueller J. Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies. Political Analysis. 2012;20(1):25-46.
4.
Suttorp N, Welte T, Marre R, Stenger S, Pletz M, Rupp J, et al. CAPNETZ. The competence network for community-acquired pneumonia (CAP). Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz. 2016;59(4):475-81.