Article
mlguide – Methodological Guidance for Applied Machine Learning
Search Medline for
Authors
Published: | September 6, 2024 |
---|
Outline
Text
Introduction: Finding the best workflow to prepare data and train, select and evaluate a machine learning (ML) model properly is complex and highly depends on the concrete research question at hand. Especially novice users can easily be overwhelmed by the variety of available methods for the individual workflow steps like data splitting, metric selection, or model evaluation [1].
State of the Art: Many tools have been created to either teach general concepts of ML to diverse users, or to support specific workflow steps often by automatization. But the current landscape lacks a comprehensive tool that efficiently guides researchers with an individual research problem through the whole workflow of ML model creation [2], [3].
Concept: Our tool mlguide aims to fill this gap by providing a user-friendly, interactive platform that facilitates the selection of suitable methods at each stage of the supervised ML process, tailored to individual research questions. An early concept of mlguide was already presented last year [4]. The general idea is that a user inputs metadata (e.g., sample size and task type) and their research goals (e.g., whether they want to compare algorithms, evaluate a selected model, or both), and based on resulting requirements, mlguide gives evidence-based recommendations for methods that can be connected to a consistent training pipeline.
Implementation: During the last year, we improved mlguide in many ways. The guidance engine was restructured and now shares many properties with a classical belief rule-based expert system [5]. Evidence pieces are specified in the general schema “Method M has property P under condition C (with belief B).” To derive a concrete method recommendation, evidence is evaluated against the user-specified problem description and then summarized. We extended our knowledge base mlguide.core by new ML workflow steps and more evidence from scientific literature. The clarity of user input to our R Shiny-based web application mlguide.app and the presentation of the guidance output were enhanced. A demonstrator version of the web application will be available during the poster session.
Lessons learned: Representing evidence from scientific publications in a structured way is a very complex task. We had to refactor the guidance engine and evidence structure several times to adapt to new requirements emerging from new evidence. While we now integrated some basic evidence for each workflow step, methodological literature is rich and growing fast. We envision that the knowledge base will be extended by expert users supported by large language models for evidence annotation, assuring that mlguide covers the whole ML landscape, recommends state of the art methods and the evidence will stay up to date.
Another challenge we face is to adapt our recommendations to different levels of expertise, as the current GUI offers too many options for novice users. We thus plan to implement a simplified mode for beginners, informed by a planned evaluation study. To further improve the user experience, we plan to implement a dynamic guidance mode that allows users to only enter a minimal problem description initially and more information later, as intelligently requested by mlguide.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Lones MA. How to avoid machine learning pitfalls: a guide for academic researchers [Preprint]. arXiv. 2024. arXiv:2108.02497v4. DOI: 10.48550/arXiv.2108.02497
- 2.
- Esposito A, Calvano M, Curci A, Desolda G, Lanzilotti R, Lorusso C, Piccinno A. End-User Development for Artificial Intelligence: A Systematic Literature Review. In: Spano LD, Schmidt A, Santoro C, Stumpf S, editors. End-User Development. IS-EUD 2023. Cham: Springer; 2023. (Lecture Notes in Computer Science; 13917). p. 19-34. DOI: 10.1007/978-3-031-34433-6_2
- 3.
- Detjen H, et al. Designing Machine Learning Workflows and Experiments with Ease: A Scoping Review of Interactive Tools. 2024. Under preparation.
- 4.
- Jäckle S, Alpers R, Westphal M. mlguide – first concept of a machine learning guidance toolkit. In: Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie, editor. 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS). Heilbronn, 17.-21.09.2023. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocAbstr. 82. DOI: 10.3205/23gmds120
- 5.
- Zhou ZJ, Hu GY, Hu CH, Wen CL, Chang LL. A Survey of Belief Rule-Base Expert System. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2021;51(8):4944-4958. DOI: 10.1109/TSMC.2019.2944893