gms | German Medical Science

GMDS 2012: 57. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

16. - 20.09.2012, Braunschweig

Probability machines: probability estimation for personalized medicine using machine learning methods

Meeting Abstract

Search Medline for

  • Jochen Kruppa - Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Deutschland
  • Andreas Ziegler - Universität zu Lübeck, Universitätsklinikum Schleswig-Holstein, Campus Lübeck, Deutschland

GMDS 2012. 57. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Braunschweig, 16.-20.09.2012. Düsseldorf: German Medical Science GMS Publishing House; 2012. Doc12gmds143

DOI: 10.3205/12gmds143, URN: urn:nbn:de:0183-12gmds1439

Published: September 13, 2012

© 2012 Kruppa et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.en). You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.


Outline

Text

Machine learning (ML) is increasingly used for data mining in biomedicine fields, alongside many other fields. Recent work has shown that machine learning can also be used for probability estimation by embedding the probability estimation problem in nonparametric regression estimation. As a result, nonparametric regression machines directly inherit their properties, such as consistency and convergence rate, to the corresponding probability machine. Their advantage over parametric standard statistical approaches, such as logistic regression is that probability machines do not require a correct specification of the functional relationship between the dependent variables and the independent variables. These methods provide robust nonparametric modeling of the regression function with minimal assumptions about the form of the relationships instead.

Probability estimation is central to personalized medicine as personal risks need to be assessed. Probability machines directly apply to assessing the probability of outcomes of interest based on different patient characteristics and interventions. They can also be used for computing propensity scores for adjustment in observational studies. They easily extend to dependent variables with multiple categories.

In this contribution we explore some consistent probability machines, such as random forest, k?nearest neighbors, and bagged nearest neighbors for the purpose of probability estimation. We show how probabilities using probability machines can be estimated using standard software. We will illustrate the approach using data from the literature as well as from our own applications.


References

1.
Malley JD, Kruppa J, Dasgupta A, Malley KG, Ziegler A. Probability machines – Consistent probability estimation using nonparametric learning machines. Meth Inf Med. 2012;51:74-81. DOI: 10.3414/ME00-01-0052 External link
2.
Kruppa J, Ziegler A, König IR. Risk estimation and risk prediction using machine learning methods. Hum Gen. 2012, in press.