gms | German Medical Science

62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

17.09. - 21.09.2017, Oldenburg

The cutpointr package: Improved and tidy estimation of optimal cutpoints

Meeting Abstract

Search Medline for

  • Christian Thiele - Hochschule Osnabrück, Osnabrück, Deutschland
  • Gerrit Hirschfeld - Hochschule Osnabrück, Osnabrück, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Oldenburg, 17.-21.09.2017. Düsseldorf: German Medical Science GMS Publishing House; 2017. DocAbstr. 199

doi: 10.3205/17gmds045, urn:nbn:de:0183-17gmds0454

Published: August 29, 2017

© 2017 Thiele et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: Clinicians often use cutpoints or decision-thresholds to decide e.g. whether or not a patient with a depression score of, say, 20 on the BDI needs treatment for her or his depression. Since these cutpoints are so important for clinical decision making, a large number of studies use Receiver-operating characteristic methods to empirically establish the optimal cutpoints for diagnostic tools. Optimal cutpoints are those cuptpoints that maximize – or minimize – a specific metric, e.g. maximize the Youden Index, or minimize the absolute difference between sensitivity and specificity.

State of the art : At present optimal cutpoints are almost always determined “empirically”, i.e. the metric that is optimized is calculated at all scores of the diagnostic tool that were found in the sample and the score that is associated with the best metric is determined as the optimal cutpoint. A number of R-packages [1] and web-based Shiny interfaces [2] exist to help applied researchers use these empirical methods to determine optimal cutpoints. In contrast to this, the biometric literature has shown that optimal cutpoints can be estimated more efficiently using distribution based or kernel based methods [3], [4]. Some recent studies found that the variability of the latter methods is lower than that of empirical methods, particularly in small samples [5].

Concept: We developed the R package cutpointr [6] to improve on the empirical method to determine optimal cutpoints and to allow for tidy and efficient estimation of optimal cutpoints. It improves existing solutions in at least four dimensions. First, in addition to empirical methods, cutpointr povides kernel estimation and distribution-based methods to estimate optimal cutpoints. Second, cutpointr was designed to use a large number of the metrics that have been put forward to date and enables the user to very easily define their own functions for metrics. Third, cutpointr provides a parallelizable routine to estimate cutpoints’ variability and various in- and out-of-bag performance metrics. This is important because only parallelization and efficient calculation of optimal cutpoints enable large-scale simulation studies and resampling methods to determine cutpoint variability. Fourth, cutpointr provides several convenient built-in plotting functions, e.g. for ROC curves and precision recall plots.

Implementation: Cutpointr is distributed as an R-package [6]. Cutpointr follows current tidy programming practices to allow for efficient estimation, use in simulation studies, and interplay with functions from the tidyverse. A shiny-interface is currently under development.

Lessons Learned: We found that using the tidy framework for package development made it necessary to extensively adapt existing functions. At the same time, this is the main reason why cutpointr compares favorably to existing solutions with regard to usability.



Die Autoren geben an, dass kein Interessenkonflikt besteht.

Die Autoren geben an, dass kein Ethikvotum erforderlich ist.


References

1.
López-Ratón M, Rodríguez-Álvarez MX, Cadarso-Suárez C, Gude-Sampedro F. Optimal Cutpoints: An R Package for Selecting Optimal Cutpoints in Diagnostic Tests. Journal of Statistical Software. 2014;61(i08).
2.
Goksuluk D, Korkmaz S, Zararsiz G, Karaagaoglu AE. easyROC: An Interactive Web-tool for ROC Curve Analysis Using R Language Environment. The R Journal. 2016;8(2):213–230.
3.
Fluss R, Faraggi D, Reiser B. Estimation of the Youden Index and its associated cutoff point. Biometrical Journal. 2005;47(4):458–472.
4.
Leeflang MM, Moons KG, Reitsma JB, Zwinderman AH. Bias in sensitivity and specificity caused by data-driven selection of optimal cutoff values: mechanisms, magnitude, and solutions. Clinical Chemistry. 2008;(4):729–738.
5.
Hirschfeld G, do Brasil PE. A simulation study into the performance of “optimal” diagnostic thresholds in the population: “Large” effect sizes are not enough. Journal of Clinical Epidemiology. 2014;67(4):449–453.
6.
Thie1e/cutpointr. https://github.com/Thie1e/cutpointr External link