Article
LeukoExpert: Early detection of leukodystrophy patients
Search Medline for
Authors
Published: | September 6, 2024 |
---|
Outline
Text
Introduction: Leukodystrophies (LD) are a group of rare neurological diseases with a genetic predisposition. The prevalence of LD ranges from 1:40,000 to 1,100,000 depending on approximately 60 subtypes [1]. Due to the rare occurrence of the disease and its similarity in the early stages to other neurological diseases such as multiple sclerosis as a differential diagnosis (DD), misdiagnosis occurs frequently [2]. Therefore, LD patients suffer from a long journey before the correct diagnosis is made. There are several early stage gene therapies available requiring a clear and precise diagnosis as earliest as possible. In Germany, there are two medical expert centers (Leipzig and Tübingen) where LD patients are examined, diagnosed, and treated. The goal of this work is to create a classification model to distinguish between LD and DD to address the challenge to guide all potential LD patients to the expert centers.
Methods: With the LeukoExpert project, we have designed and established a distributed LD registry allowing us to collect data on LD patients in both expert centers separately. We added a third instance at the UK Aachen for all DD. All registry instances use the almost similar schema for interoperability reasons and are implemented with Research Electronic Data Capture (REDCap) [3]. The Registries contain structured data such as basic demographic data, medical history, and examination data from different time points of the patient. The registry contains more than 850 patients; 500 (350) patients with LD (DD) covering 28 (4) LD subtypes (DD). The Personal Health Train (PHT) is used to analyze captured data in a distributed mode, i.e., analysis algorithms are shipped to the data integration center in all three centers managing the registry instances. The incremental distributed analysis focuses on the differentiation between LD and DD in the first 5 years after symptoms onset. We applied Naive Bayes (NB), Linear Regression (LR), Global Boosting Classifier (GBC), Random Forest (RF), and MLP for the binary classification. We split the data ten times into 80% training data and 20 % testing data.
Results: For the binary classification (LD vs. DD), we achieved a mean accuracy of 81% for RF, 80% for GBC, 79% for LR, 61% for NB and 57% for MLP. Furthermore, the results contain a ranked list of symptoms, i.e., symptoms like spasticity and gait disturbance are important for decision, and information about family medical history.
Discussion: The use of a distributed registry approach is innovative and provides new opportunities, but also comes with challenges. In comparison to other studies using MRIs [4], the accuracy is reduced while the number of patients is much higher. First classification results show a good performance and the extracted features are in line with medical expertise. Further improvement, like generalization of symptoms, should be applied to increase the accuracy to a level where the model could be used in the daily routine.
Conclusion: We established a distributed registry and analyzed the captured data therein to support clinicians in their diagnostic procedures to distinguish between LD and DD.
The authors declare that they have no competing interests.
The authors declare that a positive ethics committee vote has been obtained.
References
- 1.
- Wasserstein MP, Andriola M, Arnold G, Aron A, Duffner P, Erbe RW, et al. Clinical outcomes of children with abnormal newborn screening results for Krabbe disease in New York State. Genetics in Medicine. 2016;18(12):1235–43.
- 2.
- Costello DJ, Eichler AF, Eichler FS. Leukodystrophies: Classification, Diagnosis, and Treatment. The Neurologist. 2009;15(6):319–28. DOI: 10.1097/NRL.0b013e3181b287c8
- 3.
- Harris PA, Taylor R, Minor BL, Elliott V, Fernandez M, O’Neal L, et al. The REDCap consortium: Building an international community of software platform partners. Journal of Biomedical Informatics. 2019;95:103208. DOI: 10.1016/j.jbi.2019.103208
- 4.
- Mangeat G, Ouellette R, Wabartha M, De Leener B, Plattén M, Danylaité Karrenbauer V, et al. Machine Learning and Multiparametric Brain MRI to Differentiate Hereditary Diffuse Leukodystrophy with Spheroids from Multiple Sclerosis. Journal of Neuroimaging. 2020;30(5):674–82. DOI: 10.1111/jon.12725