gms | German Medical Science

68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

17.09. - 21.09.23, Heilbronn

mlguide – first concept of a machine learning guidance toolkit

Meeting Abstract

Search Medline for

  • Sonja Jäckle - Fraunhofer Institute for Digital Medicine MEVIS, Lübeck, Germany
  • Rieke Alpers - Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany
  • Max Westphal - Fraunhofer Institute for Digital Medicine MEVIS, Bremen, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS). Heilbronn, 17.-21.09.2023. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocAbstr. 82

doi: 10.3205/23gmds120, urn:nbn:de:0183-23gmds1202

Published: September 15, 2023

© 2023 Jäckle et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: For medical research questions, machine learning (ML) has become a commonly used tool [1]. Due to various methodological pitfalls, an appropriate development and validation of ML models might be difficult for non-experts [2], [3]. Interactive guidance based on expert knowledge may assist practitioners with these tasks but is not readily available so far. We aim to fill this gap with our ML guidance toolkit mlguide, a novel expert system to provide methodological support in applied ML [4]. Our goal is to recommend suitable methods that are tailored to the users ML task based on the current (published) evidence. The development of the mlguide toolkit is part of the KI-FDZ project that aims at facilitating the usage of artificial intelligence at the Health Data Lab [5].

Methods: We conducted a literature review to derive a workflow suitable for supervised ML models. Second, an online survey was created to determine which problems are most common and important during ML model development. Furthermore, we developed a first software concept of our mlguide toolkit and its architecture.

Results: Our ML workflow consists of four parts: problem definition – data processing – model development – deployment. The mlguide toolkit will support the second and third part interactively by providing methodological recommendations based on the user-specified problem description.

The survey targeting ML practitioners, data scientists and students, among others, has been completed and is now accessible online (http://websites.fraunhofer.de/mlguide-user-survey/index.php/578345).

Based on a literature review about expert systems, we designed the architecture of mlguide toolkit with three components: The web application mlguide.app provides an interface for users to interact with the whole system. Its backend is connected to the R package mlguide which contains all development tools and the guidance engine for managing user requests for evidence-based recommendations. For this purpose, mlguide interacts with mlguide.core that contains and manages the expert knowledge.

Discussion: The current toolkit version addresses only supervised ML models and tabular data. Several extensions to other ML models or data types are thus possible. The results of our open survey will guide our future developments of the supported ML workflow and our web application. An early prototype of the mlguide toolkit is currently running and being tested internally.

Conclusion: We presented the current state of our mlguide toolkit that guides ML practitioners during the development of a ML model. It supports the user with the data preparation and model development process. In future work, the results of the first user survey will be used to improve our current ML workflow and our current mlguide toolkit implementation. One important next implementation step will be to optimize the process of capturing evidence in the system by ML experts and to explore how unstructured knowledge based on publications can be extracted in a suitable structured format. Furthermore, we will conduct user interviews with more specific questions to determine further potential improvements.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Shehab M, Abualigah L, Shambour Q, Abu-Hashem MA, Shambour MKY, Alsalibi AI, Gandomi AH. Machine learning in medical applications: A review of state-of-the-art methods. Comput Biol Med. 2022;145:105458. DOI: 10.1016/j.compbiomed.2022.105458 External link
2.
Haller MC, Aschauer C, Wallisch C, Leffondré K, van Smeden M, Oberbauer R, Heinze G. Prediction models for living organ transplantation are poorly developed, reported and validated: a systematic review. J Clin Epidemiol. 2022;145:126-135. DOI: 10.1016/j.jclinepi.2022.01.025 External link
3.
Lones MA. How to avoid machine learning pitfalls: a guide for academic researchers [preprint]. arXiv. 2021. arXiv:2108.02497.
4.
Tan CF, Wahidin LS, Khalil SN, Tamaldin N, Hu J, Rauterberg GWM. The application of expert system: A review of research and applications. ARPN J Eng Appl Sci. 2016;11(4):2448-2453.
5.
Bundesministerium für Gesundheit. Künstliche Intelligenz am Forschungsdatenzentrum im BfArM zur Erforschung von Anonymisierungsmöglichkeiten und AI-readiness (KI-FDZ) [Internet]. [cited 2023 Jan 27]. Available from: https://www.bundesgesundheitsministerium.de/ministerium/ressortforschung-1/handlungsfelder/digitalisierung/ki-fdz.html External link