Artikel
Regularized regression to identify U-shaped relationships of highly correlated predictors
Suche in Medline nach
Autoren
| Veröffentlicht: | 15. September 2023 |
|---|
Gliederung
Text
Introduction: The use of established techniques like elastic net regression helps in the selection of variables, taking into account the collinearity between predictors, but its application is restricted to detecting linear relationships [1]. On the other hand, Group Least Absolute Shrinkage and Selection Operator (G-LASSO) is capable of selecting spline representatives of variables but fails to consider the collinearity among predictors [2]. To facilitate the identification of non-linear relationships while also being tolerant to collinearity, a combination of both approaches may be useful.
Methods: To perform a collinearity-tolerant selection of nonlinear relationships, G-LASSO in combination with an additional
-norm applied at the group level was used. The functional form of predictors was modeled with linear splines and the additional
-norm at the group level weighted by a tuning parameter α. The performance of this approach in variable selection was compared in a simulation study with classical LASSO and Elastic net with different fractional polynomials. Different values for α were considered and the shrinkage parameter λ was determined for all approaches by 10-fold cross-validation.
In a real-world application analyzing data from a heart failure study (MyoVasc, ClinicalTrials.gov Identifier: NCT04064450), the approaches were used to select markers of heart rate variability, i.e., variation in the time interval between heartbeats, associated with augmentation index, a measure of arterial vascular stiffness [3]. The prediction performance of the generated models was evaluated using a hold-out dataset.
Results: The use of G-LASSO combined with an additional
-norm regularization was highly effective in variable selection, regardless of whether the relationships between predictors are purely linear, purely U-shaped, or a combination of both. This approach outperformed classical methods, even when those methods were supplied with accurate functional forms for the predictors. The additional group-level
-norm resulted in moderate improvement over a strategy without it, especially in scenarios with predictors with low correlation (Pearson’s r ≤ 0.2). The optimal performance was obtained with high α values, such as 0.9, indicating that the impact of the
-norm regularization was limited. In a real-world application, the compared methods produced similar predictive performance R2 around 0.5, but the resulting models differed in size and showed moderate overlap. Here, the sparsest model was created by G-LASSO with an additional
-norm, while Elastic net created the largest model. These results were largely consistent with those from the simulation study.
Conclusions: By using a combination of a linear spline modeling strategy and a regularized
-norm group selection operator, a pragmatic technique can be developed that offers several appealing properties. This approach is capable of reducing the feature space to a predictive subset while taking into account the high correlation between predictors. The generated models are just as predictive as those of classical approaches, while doing better justice to the functional form. Despite these additional properties, the resulting models remain easy to interpret because the functional form is estimated using simple splines, which makes this technique particularly attractive for practical applications.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B. 2005;67(2):301-20.
- 2.
- Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B. 2006;68(1):49-67.
- 3.
- Göbel S, Prochaska JH, Tröbs SO, Panova-Noeva M, Espinola–Klein C, Michal M, et al. Rationale, design and baseline characteristics of the MyoVasc study: a prospective cohort study investigating development and progression of heart failure. European Journal of Preventive Cardiology. 2021;28(9):1009-18.
