gms | German Medical Science

GMDS 2013: 58. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

01. - 05.09.2013, Lübeck

Classification Algorithms for Autoimmune Diabetes

Meeting Abstract

Suche in Medline nach

  • Rainer Schmidt - Universität Rostock, Rostock, DE
  • Georg Fuellen - Universität Rostock, Rostock, DE

GMDS 2013. 58. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Lübeck, 01.-05.09.2013. Düsseldorf: German Medical Science GMS Publishing House; 2013. DocAbstr.4

doi: 10.3205/13gmds061, urn:nbn:de:0183-13gmds0612

Veröffentlicht: 27. August 2013

© 2013 Schmidt et al.
Dieser Artikel ist ein Open Access-Artikel und steht unter den Creative Commons Lizenzbedingungen (http://creativecommons.org/licenses/by-nc-nd/3.0/deed.de). Er darf vervielfältigt, verbreitet und öffentlich zugänglich gemacht werden, vorausgesetzt dass Autor und Quelle genannt werden.


Gliederung

Text

Introduction: The aim of our project is to elucidate the mechanisms of the modulating anti CD4 antibody RIB5/2 on prevention of autoimmune destruction of beta cells in the insulin dependent diabetes mellitus (IDDM) rat model. We especially wish to calculate relative risk coefficients for development to overt diabetes at different time points of life. So far, we have applied decision trees and other classification algorithms like random forest and support vector machines on rather small data sets. Furthermore, feature selection was used to support the genes and biomarkers generated by the decision trees.

Materials and Methods: In a first experiment just twelve rats were monitored. For a second experiment new rats could be bred. Since different measurement facilities were used, the data of both experiments could not be merged. The number of rats was bigger than in the first experiment, namely 35 rats, but the number decreases in time, because some of the animals died during the experiment. There are not just two classes, animals that developed diabetes and those that did not, but additionally a background strain of animals that are resistant against diabetes because of specific treatment. In both experiments the rats were monitored for gene expression data in blood immune cells for functional gene clusters on the days 30, 35, 40, 45, 50, 55, 60, 65, 70, 80, and 90 of their life. However, just the days between 45 and 60 are assumed to be important for the prediction whether a rat will develop diabetes. The WEKA environment [1] was used to apply classification methods (decision trees, random forest, support vector machines, naïve bayes, and nearest neighbour) and feature selection measures (information gain, gain ratio, and relief).

Results: Though in the first experiment the classification accuracies with 10-fold cross-validation varied for the different methods just between 58% and 75%, the results of the decision trees [2] clearly indicate that at an early prediabetic stage (after 45 days of live), the RT6 T cell proliferation gene is most decisive for diabetes onset in the IDDM rat, followed by selectin and neuropilin at the stage of islet infiltration (after 50 days), and IL-4 during progression of beta cell destruction (after 55 days). The biologists that are involved in the project are happy with these results and can explain them very well. Furthermore, the findings could be supported by the application of feature selection methods. Though the number of animals in the second experiment was much bigger, the classification accuracies are just slightly better than in the first experiment. Here, feature selection could partly support our previous findings, especially the importance of selectin and neuropilin.

Discussion: The main idea of our project is to calculate relative risk coefficients for development to overt diabetes at different time points of life. The classification results are not so well but the experts are quite happy with the trees generated by the decision tree algorithm and these findings can be supported by feature selection methods.


References

1.
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. SIGKDD Explorations. 2009; 11(1): 10-18.
2.
Quinlan JR. C4.5 Programs for Machine Learning. San Mateo: Morgan Kaufmann; 1993.