gms | German Medical Science

67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

21.08. - 25.08.2022, online

A Novel Gradient-Boosting Approach for Linear Mixed Models

Meeting Abstract

Suche in Medline nach

  • Robert Kuchen - Universitätsmedizin Mainz, Institut für Medizinische Biometrie, Epidemiologie und Informatik (IMBEI), Mainz, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 21.-25.08.2022. Düsseldorf: German Medical Science GMS Publishing House; 2022. DocAbstr. 125

doi: 10.3205/22gmds102, urn:nbn:de:0183-22gmds1028

Veröffentlicht: 19. August 2022

© 2022 Kuchen.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: Griesbach et al. [1] proposed a novel gradient boosting approach for cluster data, which claimed to yield unbiased fixed- and random-effect estimates in a general setting. Yet, simulations show that it tends to produce biased results, in particular applied to datasets involving imbalanced cluster sizes. To solve this problem, I will therefore introduce both a slightly amended version and a novel approach.

Statistical gradient boosting [2], [3] is a powerful machine-learning algorithm that implicitly leads to variable selection and shrinkage of estimated coefficients. It aims at minimizing a chosen loss function by descending the empirical loss function via the steepest gradient in function space. In statistical gradient boosting, an additive regression model is incrementally built up from a set of baselearners, which are themselves simple regression models. In the case of clustered data, in addition to fixed-effects baselearners, random-effect baselearners are often included to take the cluster structure into account.

Yet, how to include random effects into the boosting framework has since been an ongoing debate. Friedman et al. [2] and Kneib et al. [4] suggested treating random effects the same way as fixed-effect baselearners, i.e. they are selected if they best fit the negative gradient vector in a given iteration, resulting in either a fixed-effect baselearner or a random-effect baselearner being chosen, but not both. Yet, this approach tends to lead to an asynchronous updating scheme of covariates that are considered as both fixed and random effects, often resulting in biased random- and possibly fixed-effect estimates. Therefore, Griesbach et al. [1] proposed a new algorithm, in which the random-effects are chosen in advance and are in each iteration updated separately from the fixed-effect baselearners. This approach constitutes a significant improvement over the algorithms introduced by Friedman and Kneib and succeeds in giving accurate estimates in many settings.

Methods: Yet, in this presentation, I will show that it generally results in considerably biased random-effects estimates, and particularly in biased estimates of both fixed and random effects in settings involving imbalanced cluster sizes. First, I will lay out that the algorithm proposed by Griesbach et al. can easily be amended by initiating all random effects with a value of 0 and by allowing a random-effect-specific learning rate, resulting in substantially more accurate fixed-effect estimates in those settings in which the original algorithm tends to perform poorly. Subsequently, I present a novel approach in which random effects are not incrementally updated, but rather, conditionally on the current fixed-effect estimates, fitted anew in every iteration.

Results: In a simulation study, both of my boosting algorithms resulted in accurate fixed-effect estimates and outperformed a conventional mixed model regarding out-of-sample prediction performances. Yet, the novel approach also yielded accurate random-effect estimates and caused both fixed and random effects to converge against their maximum likelihood estimate even in difficult settings.

Conclusion: Since in contrast to the amended approach of Griesbach et al. the novel approach also leads to accurate random-effect estimates, I recommend using it in the future.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Griesbach C, Säfken B, Waldmann E. Gradient boosting for linear mixed models. Int J Biostat. 2021 Jan 13;17(2):317-329.
2.
Friedman J, Hastie T, Tibshirani R.  Additive logistic regression: a statistical view of boosting (with discussion and a rejoinder by the authors). The Annals of Statistics. 2000;28(2):337-407.
3.
Friedman J. Greedy function approximation: A gradient boosting machine. The Annals of Statistics. 2001;29(5):1189–1232.
4.
Kneib T, Hothorn T, Tutz G. Variable selection and model choice in geoadditive regression models. Biometrics. 2009;65(2):626–634.