GMS | 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS) | Efficient permutation testing of variable importance measures in machine learning

68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

17.09. - 21.09.23, Heilbronn

Article

XML version

Send article

Efficient permutation testing of variable importance measures in machine learning

Meeting Abstract

Search Medline for

Alexander Hapfelmeier - Institute of AI and Informatics in Medicine, Technical University of Munich, Munich, Germany; Institute of General Practice and Health Services Research, Technical University of Munich, Munich, Germany
Roman Hornung - Institute for Medical Information Processing, Biometry and Epidemiology, University of Munich, Munich, Germany
Bernhard Haller - Institute of AI and Informatics in Medicine, Technical University of Munich, Munich, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS). Heilbronn, 17.-21.09.2023. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocAbstr. 54

doi: 10.3205/23gmds001, urn:nbn:de:0183-23gmds0017

Published:	September 15, 2023

© 2023 Hapfelmeier et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.

Outline

Text

Introduction: Variable importance measures (VIMPs) are a popular means of assessing the relevance of a predictor variable in a prediction model. VIMPs are particularly useful for gaining insight into machine learning models, which are often referred to as black boxes. There have also been many attempts to assess the statistical significance of VIMPs through hypothesis testing, e.g. to perform variable selection or to identify prognostic and predictive factors. Especially in Random Forests (RF), which serve as an application example here, this topic remains subject of ongoing research. Heuristic approaches to parametric testing have been proposed. However, they often rely on distributional assumptions from empirical evidence. More recently, formal tests have been derived analytically, but can be computationally expensive or even infeasible in practice. In an own work, nonparametric permutation testing has been proposed as a very general and distribution-free approach that can be applied to any type of model and VIMP [1]. However, it shares the problem of limited computational feasibility, especially when using computationally expensive prediction models, VIMP or Big Data.

Methods: To address the feasibility issue of conventional permutation testing, we propose to use sequential permutation testing and sequential p-value estimation [2]. We use the popular permutation VIMP measure of RF, both of which are computationally expensive, to demonstrate the practicality and relevance of our approach. Several simulation studies were performed to investigate whether the theoretical properties of statistical tests hold when sequential methods are applied. The Pima Indians Diabetes Database was used to investigate the numerical stability of the methods in a well-known setting. An additional application to data from a SARS-CoV-2 diagnostic study was used to illustrate the potentially huge savings in computational costs.

Results: The theoretical properties of the methods were met in the simulation studies. The type-I error probability was controlled at the nominal level. High power was maintained (≥97% compared to conventional permutation testing). Considerably fewer permutations were required (e.g. ≤40 instead of a maximum of 500 under H0 in the simulation studies and 18.6% of 500 in the application study). The numerical stability of results was problematic for variables with “borderline” significance, but could be improved by reducing the additional variability introduced by the model building and estimation of VIMP.

Discussion: The sequential methods showed an error control and power that was almost as good as for conventional permutation testing. They can therefore be recommended to assess the statistical significance of VIMP at considerably reduced computational cost. When using RF as prediction model, a large number of trees should be used to obtain stable results. Although RF's permutation VIMP has been used here as a relevant example, the proposed methods can be applied to any kind of prediction model or VIMP.

Conclusion: Theoretically sound sequential p-value estimation and permutation testing of VIMPs is possible and less computationally expensive than conventional permutation testing approaches. In the case of complex prediction models, VIMPs and Big Data, the proposed methods can lead to considerable savings. An implementation is provided in the R package ‘rfvimptest’ on the Comprehensive R Archive Network (CRAN).

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.

Outline

References

1.: Hapfelmeier A, Ulm K. A new variable selection approach using random forests. Computational Statistics & Data Analysis. 2013;60:50-69.
2.: Hapfelmeier A, Hornung R, Haller B. Efficient permutation testing of variable importance measures by the example of random forests. Computational Statistics & Data Analysis. 2023. pii: 107689.

gms | German Medical Science

68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Article

Efficient permutation testing of variable importance measures in machine learning

Search Medline for

Authors

Outline

Text

References