Artikel
Extending the Biosignal and Imaging Data Managing Platform XNAT by High Performance Computing for Reproducible Processing
Suche in Medline nach
Autoren
Veröffentlicht: | 6. September 2024 |
---|
Gliederung
Text
Introduction: Computationally expensive methods, such as training deep learning models on large datasets, require increasingly computing power in medical research [1]. Processing on small local machines is oftentimes insufficient w.r.t resource requirements. High Performance Computing (HPC) clusters enable extensive analyses with powerful interconnected hardware.
To ensure reproducibility, data can be stored in a central research data management system. Subsequently, it can be transferred to the cluster through a bridge between the two systems.
In this work, we present a method to connect the research data management platform eXtensible Neuroimaging Archive Toolkit (XNAT) [2] to an HPC system to establish an HPC data analysis pipeline. As a use-case we conducted the annually proposed George B. Moody challenges (GBMC) which address current medical research questions. Typically a prediction task needs to be solved on training data in a certain time period which necessitates efficient workflows.
Methods: The established, open-source XNAT enables storage of medical data and file transfer using a standardized Representational State Transfer (REST) interface. The sytems HPC and XNAT are hosted by our local infrastructure provider. We implemented a so-called HPC Jobscript and a Python data transfer script as an extension. The HPC Jobscript executes four steps:
- 1.
- starting a XNAT workflow,
- 2.
- file exchange via the data transfer script,
- 3.
- executing the user analysis script and
- 4.
- ending the workflow.
To exchange project files via REST, the data transfer script extends our command line tool XN.
By transferring five recent GBMC datasets (Experiment 1) and self-generated data with predefined projet sizes (Experiment 2) from XNAT to HPC, we evaluated the infrastructure while capturing the transfer speed and determined the impact of three indicators project size, file size, and subject size.
Results: We transferred (4.307 ± 0.622) subjects/min for the GBMC 2023 data to (108.817 ± 2.427) subjects/min for GBMC 2019. In context of transferred files/min, we achieved (108.817 ± 2.427) (GBMC 2019) to (346.784 ± 50.047) (GBMC 2023). Scores from (0.701 ± 0.016) (GBMC 2019) to (180.214 ± 26.008) (GBMC 2023) were observed for the indicator MiB/min.
Experiment 2 showed positive correlations in terms of elapsed times with both total project size and total file amounts (0.09h to 9.73h).
Discussion: The number of subjects stored in XNAT plays the most significant role in transfer behavior, but the combination of the project files and project size may have a greater impact. Nevertheless, all parameters showed an influence on the transfer speed. The combination of overhead of the REST interface and servers, different system load and total amount of Bytes transferred could explain the behvaior. The infrastructure will be used in our BMBS funded Project Somnolink. Due to the standardized XNAT REST interface, the bridge is portable to other XNAT systems and SLURM-enabled HPC clusters. We currently extend the infrastructure by a secure RESTful authorization method, that would allow to start and end the workflow directly from XNAT [3]. With our approved secure HPC workflow [4], processing of health data in the care context is possible.
Code availability: Published code: https://gitlab.gwdg.de/medinfpub/mi-xnat/extensions/clients/XNAT-HPC
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nature Medicine. 2019 Jan;25(1):44-56. DOI: 10.1038/s41591-018-0300-7
- 2.
- Marcus DS, Olsen TR, Ramaratnam M, Buckner RL. The Extensible Neuroimaging Archive Toolkit: an informatics platform for managing, exploring, and sharing neuroimaging data. Neuroinformatics. 2007;5(1):11-34. DOI: 10.1385/ni:5:1:11
- 3.
- Köhler C, Biniaz MH, Bingert S, Nolte H, Kunkel J. Secure Authorization for RESTful HPC Access with FaaS Support. International Journal on Advances in Security. 2022 Dec;15(3 and 4):119-31. Available from: https://resolver.sub.uni-goettingen.de/purl?gro-2/119762
- 4.
- Nolte H, Spicher N, Russel A, Ehlers T, Krey S, Krefting D, et al. Secure HPC: A workflow providing a secure partition on an HPC system. Future Generation Computer Systems. 2023 Apr;141:677-91. DOI: 10.1016/j.future.2022.12.019