gms | German Medical Science

GMDS 2013: 58. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

01. - 05.09.2013, Lübeck

Parallelization of FSL-­Fast segmentation of MRI brain data

Meeting Abstract

Search Medline for

  • Joachim Weber - Regensburg University of Applied Sciences, Regensburg, DE
  • Alexander Brawanski - Regensburg University, Regensburg, DE
  • Christoph Palm - Regensburg University of Applied Sciences, Regensburg, DE

GMDS 2013. 58. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Lübeck, 01.-05.09.2013. Düsseldorf: German Medical Science GMS Publishing House; 2013. DocAbstr.329

doi: 10.3205/13gmds261, urn:nbn:de:0183-13gmds2611

Published: August 27, 2013

© 2013 Weber et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( You are free: to Share – to copy, distribute and transmit the work, provided the original author and source are credited.



Introduction: Segmentation of MRI brain images into different tissue types is one important preprocessing step in brain image analysis and neurosurgery planning. FSL-Fast segmentation software [1] is widely distributed and used for this purpose, e.g. [2]. In its version 3 and 4, a sequential implementation is provided. Goal of this work is the parallelization of the original algorithm using graphics processing units (GPUs).

Material and Methods: In general, the FSL-Fast algorithm consists of initial kMeans clustering, windowing and contrast enhancement followed by iterative execution of lowpass filtering and Hidden Markov Model estimation. The original code was re-implemented by using the segmentation and registration toolkit (ITK). This allows enhanced IO handling, e.g. enable input of Nifti data instead of Analyze 7.5. Additionally, ITK supports the integration of OpenCL code starting with version 4. OpenCL [3] enables execution on GPUs, which features a massive parallel arithmetic processing power in comparison to central processing units (CPUs). Therefore, a CPU and a GPU version on basis of the same toolkit was implemented which allow a direct comparison of results and performance. Due to memory synchronization, communication overhead and data transfer, parallel programming shows only reasonable speed up for modules with repeated operations on the same data. Therefore, we started with the parallelization of FSL-Fast with the modules kMeans clustering and lowpass filtering.

Results: We evaluated the CPU and the GPU approach on basis of four T1-weighted data sets. The parallel implementation of the kMeans clustering and lowpass filtering resulted in a speed up factor of ~1.4 and ~6, respectively, in comparison to the sequential implementation. Referring the constancy of results comparing the two different approaches, kMeans clustering yielded exactly the same results. In contrast, lowpass filtering showed differing results depending on the number of iterations and the data itself in the range of approximately 10exp(-4). Reasons are rounding errors of floating point operations on GPUs. Although the floating point execution is fully IEEE 754 compliant, OpenCL specifications permit rounding errors on certain operations [3]. The original software utilizes X87- Float Point Execution, which is by default not IEEE 754 compliant [4]. However, changing the internal precision to IEEE 754 compliant floating point execution resulted only in minor differences. An additional reason for differing results of CPU and GPU processing is the compilation process itself. The OpenCL compiler is GPU architecture dependent and tries to optimize the code for the underlying hardware yielding non-predictable order of operations [5].

Discussion: We implemented and compared a sequential and a parallel implementation of FSL-Fast segmentation. We found a reasonable speed up, but differing results for the lowpass filtering caused by rounding errors. In application areas, where exact results are mandatory, such rounding errors have to be taken into account. For FSL-Fast we try to avoid those errors by changing to double precision numbers and with the application of specific OpenCL compiler options.


Zhang Y, Brady M, Smith S. Segmentation of brain MR images through a hidden Markov random field model and the expectation-maximization algorithm. IEEE Trans Med Imag. 2001; 20(1):45-57.
Newlander SM, Chu A, Sinha US, Lu PH, Bartzokis G. Methodological improvements in voxel‐based analysis of diffusion tensor images: Applications to study the impact of apolipoprotein E on white matter integrity. Journal of Magnetic Resonance Imaging. 2013.
Aaftab Munshi, Khronos Group. The OpenCL Specification v1.2, Document Revision. 2012; 19.
Intel Corporation. IA-32 Intel Architecture Software Developer's Manual. 2004.
Owens JD, Luebke D, Govindaraju N, Harris M, Krüger J, Lefohn AE, Purcell TJ. A Survey of General-Purpose Computation on Graphics Hardware. Eurographics. 2005: 21-51.