Article
MISm: A medical image segmentation metric for evaluation of weak labeled data
Search Medline for
Authors
Published: | August 19, 2022 |
---|
Outline
Text
Introduction: Performance measures are an important tool for assessing and comparing different medical image segmentation algorithms. Unfortunately, the current measures have their weaknesses when it comes to assessing certain edge cases like a very small region of interest or no region of interest at all [1], [2], [3], [4], [5].
Methods: As a solution for these limitations, we propose a new medical image segmentation metric: MISm. MISm is based on the dice similarity coefficient (DSC), in case there are actual positive conditions in the segmentation. If there are no positive conditions, the MISm corresponds to the weighted Specificity (wSpec). The wSpec is the Specificity, also called True Negative Rate, where we added additional weights α and (1 – α) for balancing true negatives and false positives.
Results: The MISm has no definition gaps. The DSC is not defined if there is no actual positive segmentation, but this case is covered by wSpec in MISm. The wSpec is not defined if no true negative and no false-positive predictions were made. This case only occurs if there are actual positive conditions in the segmentation, which is prevented by the DSC in the MISm.
Discussion: In this work, we have developed the metric MISm, which proposes a solution to the limitations identified within current gold-standard metrics popular in the field of MIS. By utilizing wSpec, MISm allows evaluating datasets with weak label annotations. By theoretical analysis and application, it was proven that MISm is an always applicable metric that is suitable for appropriate prediction scoring.
Conclusion: In this paper, we identified the limitations of several popular MIS metrics. This is why we proposed our novel metric MISm, which calculates meaningful values for common as well as edge cases arising in MIS. To allow application in the community and reproducibility of experimental results, we included MISm in the publicly available evaluation framework MISeval: https://github.com/frankkramer-lab/miseval/tree/master/miseval
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020 Jan 2;21(1):6. DOI: 10.1186/s12864-019-6413-7
- 2.
- Müller D, Soto-Rey I, Kramer F. Towards a Guideline for Evaluation Metrics in Medical Image Segmentation [Preprint]. ArXiv. 2022. arXiv:2202.05273. DOI: 10.48550/arXiv.2202.05273
- 3.
- Taha AA, Hanbury A. Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool. BMC Med Imaging. 2015 Aug 12;15:29. DOI: 10.1186/s12880-015-0068-x
- 4.
- Powers DMW. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation [Preprint]. ArXiv. 2020. arXiv:2010.16061. DOI: 10.48550/arXiv.2010.16061
- 5.
- Zhang Y, Mehta S, Caspi A. Rethinking Semantic Segmentation Evaluation for Explainability and Model Selection [Preprint]. ArXiv. 2021. arXiv:2101.08418. DOI: 10.48550/arXiv.2101.08418