gms | German Medical Science

65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS)

06.09. - 09.09.2020, Berlin (online conference)

Variable selection and shrinkage in low-dimensional data

Meeting Abstract

Search Medline for

  • Edwin Kipruto - Institut für Medizinische Biometrie und Informatik, Universitätsklinikum Freiburg, Freiburg, Germany
  • Willi Sauerbrei - Institut für Medizinische Biometrie und Informatik, Universitätsklinikum Freiburg, Freiburg, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS). Berlin, 06.-09.09.2020. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 376

doi: 10.3205/20gmds320, urn:nbn:de:0183-20gmds3200

Published: February 26, 2021

© 2021 Kipruto et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Background: Over the last two decades combination of variable selection and shrinkage has grown in popularity due to the need to correct for overestimation bias and to improve the prediction accuracy of a model. Several statistical approaches have been developed such as the non-negative garrote (NNG) [1] and Lasso [2] but the former has received little attention despite some conceptual advantages, especially when more than the derivation of a good predictor is of interest. Descriptive modelling aims to summarize the data structure in a compact manner, which means that variables with a stronger effect need to be identified and estimates of selected models should be (nearly) unbiased [3]. Besides penalized methods, post-selection cross-validation methods such as parameterwise shrinkage factors (PWSF) have been proposed to correct for overestimation bias [4]. The main aim of this study is to compare the performance of NNG with Lasso, adaptive lasso [5] and post-selection cross-validation methods in descriptive modeling in low-dimensional data. For small sample sizes descriptive modelling is very likely to fail and we will exclude such situations.

Methods: We will illustrate conceptual differences between the approaches and compare results in two real data applications; one of them has highly correlated predictors, which allowed us to investigate the effects of collinearity. NNG depends on the initial estimates and Breiman [1] suggested the adoption of least squares (LS). However, LS performs poorly in highly correlated settings which in turn can affect the selection of an NNG model. We will investigate whether the replacement of LS estimates by ridge estimates can improve NNG for highly correlated data [6]. We intend to investigate the extent of biasedness of the regression coefficients and provide bootstrap confidence intervals. To compare approaches we conducted a simulation study based on the design of van Houwelingen and Sauerbrei [4].

Results: In the two examples, NNG and adaptive lasso selected smaller models than Lasso. For variables with a stronger effect estimates of adaptive lasso are closer to estimates of NNG than to those of lasso. For variables with a larger effect, parameterwise shrinkage factors are close to ‘1’, indicating that shrinkage is hardly needed for them. In highly correlated settings, using ridge initial estimates instead of LS estimates in NNG seems to perform well.

Conclusion: Given the good conceptual properties of NNG, this method can be used not only to select important variables but also to correct for overestimation bias and it can be applied in different statistical models. Results of the simulation are needed for an informative comparison of the procedures.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Breiman L. Better subset regression using the nonnegative garrote. Technometrics. 1995;37(4):373-384.
2.
Tibshirani R. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological). 1996;58(1):267-288.
3.
Shmueli G. To explain or to predict?. Statistical science. 2010;25(3):289-310.
4.
van Houwelingen HC, Sauerbrei W. Cross-validation, shrinkage and variable selection in linear regression revisited. 2013.
5.
Zou H. The adaptive lasso and its oracle properties. Journal of the American statistical association. 2006;101(476):1418-1429.
6.
Yuan M, Lin Y. On the non-negative garrotte estimator. Journal of the Royal Statistical Society: Series B (Statistical Methodology). 2007;69(2):143-161.