gms | German Medical Science

62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

17.09. - 21.09.2017, Oldenburg

zlibsvm: An object-oriented Java-binding for Support Vector Machines in the medical domain

Meeting Abstract

Suche in Medline nach

  • Richard Zowalla - Hochschule Heilbronn, Heilbronn, Deutschland
  • Martin Wiesner - Hochschule Heilbronn, Heilbronn, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Oldenburg, 17.-21.09.2017. Düsseldorf: German Medical Science GMS Publishing House; 2017. DocAbstr. 044

doi: 10.3205/17gmds164, urn:nbn:de:0183-17gmds1645

Veröffentlicht: 29. August 2017

© 2017 Zowalla et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: Scalable Software (SW) is increasingly important in medical research. Machine learning has become more popular and is frequently used as a research methodology by clinicians [1], [2]. However, the robustness and the reusability of academic software libraries „lag far behind those in the commercial sector“ [3]. This might arise from „ad-hoc or casual development techniques“ applied by research institutions‘ SW-developers, often in a project-based time-frame. For instance, for the purpose of (medical) classification tasks, Support Vector Machines (SVMs) are a widely accepted research methodology [4], [5], [6]. In this context, other researchers can benefit by a generic, robust, non-project based SVM library for multi-purpose classification tasks.

State of the art: LIBSVM is a well accepted library for SVMs, originally written in C [7]. It is adapted and used in other frameworks, e.g. WEKA. However, LIBSVM lacks a modern approach to object-oriented (OO) programming techniques. The existing Java implementation is merely a cross-compiled wrapper with several deficiencies. For this reason, some SW-developers individually implemented OO ports of LIBSVM. Yet, these ports are infrequently maintained, stuck in development at all, or based on old versions of the core library.

Concept: The principal ideas of modern SW-systems are (a) abstraction (b) modularity and (c) inversion of control. In the SW-industry, as well as in the medical research community, Java is an OO programming language, widely adopted for building large-scale information systems. The design of a Java library which meets all these requirements, however, is not a trivial task; especially, in the domain of SVMs, as deep knowledge of the mathematical background is required. To ensure consistency with the core library, building a lightweight OO wrapper is a valid choice.

Implementation: Before 2016, there was a lack of such a library. This gap in tooling motivated the creation of »zlibsvm«. Several OO design patterns such as (i) inheritance/composition, (ii) the builder pattern for the configuration of SVM training/classification (iii) the facade pattern to mask the complex mathematical processing, contribute to an extendable and flexible wrapper.

It is freely available on GitHub and was open-sourced in 2016 [8], distributed via Maven Central and thus, easily obtainable for the research community. The implementation was verified against the accepted and published “mushroom” dataset [9]. For quality assurance, the verification is continuously run via JUnit on every build, and especially before a public release.

In release »1.1« the library binds against LIBSVM »3.22« and itself has a size of only 28.7KB. It is licensed under the Apache License v2.0 [10], which allows the use in research and commercial projects.

Lessons Learned: For now, zlibsvm was successfully used for text classification of medical documents and health-related webpages. Moreover, a lot of potential exists to make use of this library for other medical research questions (see introduction). Therefore, the authors encourage other researchers to contribute feedback or enhancements to the library presented. A software demonstration of zlibsvm will contain (a) a live presentation of its capabilities and (b) at least two medical use-cases from a researcher’s perspective.



Die Autoren geben an, dass kein Interessenkonflikt besteht.

Die Autoren geben an, dass kein Ethikvotum erforderlich ist.


References

1.
Deo RC. Machine Learning in Medicine. Circulation. 2015;132(20):1920-30. DOI: 10.1161/CIRCULATIONAHA.115.001593 Externer Link
2.
Cruz JA, Wishart DS. Applications of machine learning in cancer prediction and prognosis. Cancer Inform. 2007;11(2):59-77.
3.
Baxter R, Chue Hong N, Gorissen D, Hetherington J, Todorov I. The Research Software Engineer. Digital Research Conference; Oxford; September, 2012.
4.
Guyon I, Weston J, Barnhill S, Vapnik V. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning. 2012;46(1/3):389-422. DOI: 10.1023/A:1012487302797 Externer Link
5.
Rahman MM, Desai B, Bhattacharya P. Medical image retrieval with probabilistic multi-class support vector machine classifiers and adaptive similarity fusion. Computerized Medical imaging and Graphics. 2008;32(2):95-108. DOI: 10.1016/j.compmedimag.2007.10.001 Externer Link
6.
Keinki C, Zowalla R, Wiesner M, Koester MJ, Huebner J. Understandability of Patient Information Booklets for Patients with Cancer. Journal of Cancer Education. epub, 2016. DOI: 10.1007/s13187-016-1121-3 Externer Link
7.
Chang CC, Lin CJ. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology. 2011;2(27):1-27. DOI: 10.1145/1961189.1961199 Externer Link
8.
Zowalla R, Wiesner M. Project page of zlibsvm code-repository. 2017, last accessed: 2017-03-09. https://github.com/rzo1/zlibsvm, Externer Link
9.
Lichman M. UCI Machine Learning Repository - Mushroom Dataset. School of Information and Computer Science. last accessed: 2017-03-09. http://archive.ics.uci.edu/ml/datasets/Mushroom Externer Link
10.
Apache Foundation. The Apache License, version 2.0. last accessed: 2017-03-09. http://www.apache.org/licenses/LICENSE-2.0 Externer Link