gms | German Medical Science

67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

21.08. - 25.08.2022, online

MISm: A medical image segmentation metric for evaluation of weak labeled data

Meeting Abstract

  • Dennis Hartmann - IT-Infrastructure for Translational Medical Research, University of Augsburg, Augsburg, Germany
  • Verena Schmid - IT-Infrastructure for Translational Medical Research, University of Augsburg, Augsburg, Germany; Medical Data Integration Center, Institute for Digital Medicine, University Hospital Augsburg, Augsburg, Germany
  • Philip Meyer - Medical Data Integration Center, Institute for Digital Medicine, University Hospital Augsburg, Augsburg, Germany
  • Iñaki Soto Rey - Medical Data Integration Center, Institute for Digital Medicine, University Hospital Augsburg, Augsburg, Germany
  • Dominik Müller - Universität Augsburg, Bayern, Deutschland, Augsburg, Germany; Medical Data Integration Center, Institute for Digital Medicine, University Hospital Augsburg, Augsburg, Germany
  • Frank Kramer - Universität Augsburg, Augsburg, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 21.-25.08.2022. Düsseldorf: German Medical Science GMS Publishing House; 2022. DocAbstr. 21

doi: 10.3205/22gmds031, urn:nbn:de:0183-22gmds0312

Published: August 19, 2022

© 2022 Hartmann et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: Performance measures are an important tool for assessing and comparing different medical image segmentation algorithms. Unfortunately, the current measures have their weaknesses when it comes to assessing certain edge cases like a very small region of interest or no region of interest at all [1], [2], [3], [4], [5].

Methods: As a solution for these limitations, we propose a new medical image segmentation metric: MISm. MISm is based on the dice similarity coefficient (DSC), in case there are actual positive conditions in the segmentation. If there are no positive conditions, the MISm corresponds to the weighted Specificity (wSpec). The wSpec is the Specificity, also called True Negative Rate, where we added additional weights α and (1 – α) for balancing true negatives and false positives.

Results: The MISm has no definition gaps. The DSC is not defined if there is no actual positive segmentation, but this case is covered by wSpec in MISm. The wSpec is not defined if no true negative and no false-positive predictions were made. This case only occurs if there are actual positive conditions in the segmentation, which is prevented by the DSC in the MISm.

Discussion: In this work, we have developed the metric MISm, which proposes a solution to the limitations identified within current gold-standard metrics popular in the field of MIS. By utilizing wSpec, MISm allows evaluating datasets with weak label annotations. By theoretical analysis and application, it was proven that MISm is an always applicable metric that is suitable for appropriate prediction scoring.

Conclusion: In this paper, we identified the limitations of several popular MIS metrics. This is why we proposed our novel metric MISm, which calculates meaningful values for common as well as edge cases arising in MIS. To allow application in the community and reproducibility of experimental results, we included MISm in the publicly available evaluation framework MISeval: https://github.com/frankkramer-lab/miseval/tree/master/miseval

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Chicco D, Jurman G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics. 2020 Jan 2;21(1):6. DOI: 10.1186/s12864-019-6413-7 External link
2.
Müller D, Soto-Rey I, Kramer F. Towards a Guideline for Evaluation Metrics in Medical Image Segmentation [Preprint]. ArXiv. 2022. arXiv:2202.05273. DOI: 10.48550/arXiv.2202.05273 External link
3.
Taha AA, Hanbury A. Metrics for evaluating 3D medical image segmentation: analysis, selection, and tool. BMC Med Imaging. 2015 Aug 12;15:29. DOI: 10.1186/s12880-015-0068-x External link
4.
Powers DMW. Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation [Preprint]. ArXiv. 2020. arXiv:2010.16061. DOI: 10.48550/arXiv.2010.16061 External link
5.
Zhang Y, Mehta S, Caspi A. Rethinking Semantic Segmentation Evaluation for Explainability and Model Selection [Preprint]. ArXiv. 2021. arXiv:2101.08418. DOI: 10.48550/arXiv.2101.08418 External link