gms | German Medical Science

68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

17.09. - 21.09.23, Heilbronn

Regularized regression to identify U-shaped relationships of highly correlated predictors

Meeting Abstract

Suche in Medline nach

  • Gregor Buch - Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany; Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany; German Center for Cardiovascular Research (DZHK), partner site Rhine Mainz, Mainz, Germany
  • Alexander Gieswinkel - Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
  • Philipp Wild - Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany; German Center for Cardiovascular Research (DZHK), partner site Rhine Mainz, Mainz, Germany; Clinical Epidemiology and Systems Medicine, Center for Thrombosis and Hemostasis, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany; Institute of Molecular Biology (IMB), Mainz, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS). Heilbronn, 17.-21.09.2023. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocAbstr. 298

doi: 10.3205/23gmds074, urn:nbn:de:0183-23gmds0747

Veröffentlicht: 15. September 2023

© 2023 Buch et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: The use of established techniques like elastic net regression helps in the selection of variables, taking into account the collinearity between predictors, but its application is restricted to detecting linear relationships [1]. On the other hand, Group Least Absolute Shrinkage and Selection Operator (G-LASSO) is capable of selecting spline representatives of variables but fails to consider the collinearity among predictors [2]. To facilitate the identification of non-linear relationships while also being tolerant to collinearity, a combination of both approaches may be useful.

Methods: To perform a collinearity-tolerant selection of nonlinear relationships, G-LASSO in combination with an additional equation 1-norm applied at the group level was used. The functional form of predictors was modeled with linear splines and the additional equation 1-norm at the group level weighted by a tuning parameter α. The performance of this approach in variable selection was compared in a simulation study with classical LASSO and Elastic net with different fractional polynomials. Different values for α were considered and the shrinkage parameter λ was determined for all approaches by 10-fold cross-validation.

In a real-world application analyzing data from a heart failure study (MyoVasc, ClinicalTrials.gov Identifier: NCT04064450), the approaches were used to select markers of heart rate variability, i.e., variation in the time interval between heartbeats, associated with augmentation index, a measure of arterial vascular stiffness [3]. The prediction performance of the generated models was evaluated using a hold-out dataset.

Results: The use of G-LASSO combined with an additional equation 1-norm regularization was highly effective in variable selection, regardless of whether the relationships between predictors are purely linear, purely U-shaped, or a combination of both. This approach outperformed classical methods, even when those methods were supplied with accurate functional forms for the predictors. The additional group-level equation 1-norm resulted in moderate improvement over a strategy without it, especially in scenarios with predictors with low correlation (Pearson’s r ≤ 0.2). The optimal performance was obtained with high α values, such as 0.9, indicating that the impact of the equation 1-norm regularization was limited. In a real-world application, the compared methods produced similar predictive performance R2 around 0.5, but the resulting models differed in size and showed moderate overlap. Here, the sparsest model was created by G-LASSO with an additional equation 1-norm, while Elastic net created the largest model. These results were largely consistent with those from the simulation study.

Conclusions: By using a combination of a linear spline modeling strategy and a regularized equation 1-norm group selection operator, a pragmatic technique can be developed that offers several appealing properties. This approach is capable of reducing the feature space to a predictive subset while taking into account the high correlation between predictors. The generated models are just as predictive as those of classical approaches, while doing better justice to the functional form. Despite these additional properties, the resulting models remain easy to interpret because the functional form is estimated using simple splines, which makes this technique particularly attractive for practical applications.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B. 2005;67(2):301-20.
2.
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B. 2006;68(1):49-67.
3.
Göbel S, Prochaska JH, Tröbs SO, Panova-Noeva M, Espinola–Klein C, Michal M, et al. Rationale, design and baseline characteristics of the MyoVasc study: a prospective cohort study investigating development and progression of heart failure. European Journal of Preventive Cardiology. 2021;28(9):1009-18.