gms | German Medical Science

67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

21.08. - 25.08.2022, online

Bi-level variable selection with the sparse group penalty framework

Meeting Abstract

  • Gregor Buch - Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany; German Center for Cardiovascular Research (DZHK), partner site Rhine-Main, Mainz, Germany; Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
  • Andreas Schulz - Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
  • Irene Schmidtmann - Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
  • Konstantin Strauch - Institute of Medical Biostatistics, Epidemiology and Informatics (IMBEI), University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany
  • Philipp Wild - Preventive Cardiology and Preventive Medicine, Department of Cardiology, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany; German Center for Cardiovascular Research (DZHK), partner site Rhine-Main, Mainz, Germany; Clinical Epidemiology and Systems Medicine, Center for Thrombosis and Hemostasis, University Medical Center of the Johannes Gutenberg University Mainz, Mainz, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 21.-25.08.2022. Düsseldorf: German Medical Science GMS Publishing House; 2022. DocAbstr. 54

doi: 10.3205/22gmds078, urn:nbn:de:0183-22gmds0783

Veröffentlicht: 19. August 2022

© 2022 Buch et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: Bi-level selection methods account for grouped predictors in the selection process to identify relevant variable groups and highlight their predictive members. This property is particularly helpful when analyzing omics datasets, as such data is often characterized by a natural group structure due to high correlations or contextual similarities of features. One of the best known bi-level selection approaches combines the absolute shrinkage and selection operator (LASSO) [1] with the group LASSO [2] in an additive manner: sparse group LASSO (SGL) [3].

A generalization of SGL that enables combinations of other shrinkage terms is desirable, as the LASSO components have some shortcomings that can be addressed by using alternative penalties.

Methods: To enable the combination of various shrinkage conditions as in SGL, a framework for sparse group penalties (SGP) is proposed. Within this framework, we have combined the minimax concave penalty (MCP) [4], the smoothly clipped absolute deviation (SCAD) [5], the exponential penalty (EP) [6] and their group versions analogous to SGL. The emerging methods are the sparse group MCP (SGM), the sparse group SCAD (SGS) and the sparse group EP (SGE). A local linear approximated coordinate descent [7] was implemented in C++ to solve their objective functions for linear and logistic regressions. Simulated datasets were used to determine optimal values for the tuning parameter α, a mixing parameter that determines the influence of the group information in the selection process. The performance of the new methods in variable and group selection was compared with other bi-level selection methods (group exponential LASSO [6], composite MCP [7] and group Bridge [8]) in simulation studies. Finally, the novel approaches were applied to the problem of detecting regulated lipids in an interventional trial (EmDia study, ClinicalTrials.gov Identifier: NCT02932436).

Results: Low values for α such as 1/10 lead to a group-level emphasized selection of the SGPs, while higher values such as 1/2 lead to better results at the variable-level. Setting α to 1/3 provides a balanced performance at both levels. Using this value, SGE was superior for variable and group selection in almost all cases where the number of variables was less than that of observations. In settings where there were more variables than observations, SGE was the best approach when few groups were relevant, SGM when a moderate number of groups were predictive, and SGS when many groups contained predictive signals. Classical SGL was consistently inferior to the other bi-level selection methods in regard to variable and group selection, but its predictive performance was strong in some situations. In the applied example, the results of the SGPs differ especially in their sparsity on the group and variable level. SGE generated the most parsimonious model followed by SGM and SGS, while SGL created the largest model.

Conclusions: Replacing the LASSO components in SGL with other shrinkage terms provides improvements in multiple performance criteria, making methods such as SGM, SGS, and SGE preferable over SGL. The advantages of these novel techniques are underscored by their ability to achieve better performance than alternative bi-level selection approaches, which the original SGL fails to do.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B. 1996;58(1):267-88.
2.
Yuan M, Lin Y. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society: Series B. 2006;68(1):49-67.
3.
Simon N, Friedman J, Hastie T, Tibshirani R. A sparse-group lasso. Journal of Computational Graphical Statistics. 2013;22(2):231-45.
4.
Zhang CH. Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics. 2010;38(2):894-942.
5.
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association. 2001;96(456):1348-60.
6.
Breheny P. The group exponential lasso for bi-level variable selection. Biometrics. 2015;71(3):731-40.
7.
Breheny P, Huang J. Penalized methods for bi-level variable selection. Statistics and its interface. 2009;2(3):369.
8.
Huang J, Ma S, Xie H, Zhang C-H. A group bridge approach for variable selection. Biometrika. 2009;96(2):339-55.