gms | German Medical Science

65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS)

06.09. - 09.09.2020, Berlin (online conference)

Variable selection and model instability [abstract of the session]

Meeting Abstract

  • Daniela Dunkler - Medical University of Vienna, Vienna, Austria
  • Christine Wallisch - Medical University of Vienna, Vienna, Austria
  • Georg Heinze - Medical University of Vienna, Vienna, Austria
  • Riccardo De Bin - University of Oslo, Oslo, Norway

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 65th Annual Meeting of the German Association for Medical Informatics, Biometry and Epidemiology (GMDS), Meeting of the Central European Network (CEN: German Region, Austro-Swiss Region and Polish Region) of the International Biometric Society (IBS). Berlin, 06.-09.09.2020. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 335

doi: 10.3205/20gmds113, urn:nbn:de:0183-20gmds1131

Published: February 26, 2021

© 2021 Dunkler et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Statistical modeling applying data-driven variable selection is a key aspect in the analysis of observational data. Most participants of the conference regularly use these methods. However, the resulting model instability is very often not acknowledged and not adequately dealt with.

Christine Wallisch will start the session with a short talk, presenting some case studies highlighting the problem of unstable statistical models. Afterwards Ewout Steyerberg will discuss his view on the issue of variable selection and model instability. He has been interested in this topic for many years. Riccardo de Bin will present an investigation of penalized regression methods on the effects of single observations on the optimal choice of the tuning parameter, the penalty. The penalty is often computed via cross-validation and its value heavily influences regression coefficient estimates and predictions. Finally, Georg Heinze will talk about some recent work on the consistency and accuracy of statistical measures quantifying model instability. He will also present a proposal on how to adequately present statistical models derived by data-driven variable selection (based on the R package ABE).

The aim of the session is to raise awareness of this important problem of statistical modeling (especially for younger or more applied researchers). We will present some answers how to solve the issue based on extensive simulation studies. After the session, attendees should know of principles and practical approaches to cope with model instability induced by variable selection.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.