gms | German Medical Science

67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

21.08. - 25.08.2022, online

Analyzing a Deep Learning Model for 12-Lead ECG Classification with Explainable AI

Meeting Abstract

  • Theresa Bender - Institut für Medizinische Informatik, Universitätsmedizin Göttingen, Göttingen, Germany
  • Jacqueline Beinecke - Institut für Medizinische Informatik, Universitätsmedizin Göttingen, Göttingen, Germany
  • Anne-Christin Hauschild - Institut für Medizinische Informatik, Universitätsmedizin Göttingen, Göttingen, Germany
  • Dagmar Krefting - Institut für Medizinische Informatik, Universitätsmedizin Göttingen, Göttingen, Germany
  • Nicolai Spicher - Institut für Medizinische Informatik, Universitätsmedizin Göttingen, Göttingen, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 67. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 13. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 21.-25.08.2022. Düsseldorf: German Medical Science GMS Publishing House; 2022. DocAbstr. 129

doi: 10.3205/22gmds050, urn:nbn:de:0183-22gmds0503

Veröffentlicht: 19. August 2022

© 2022 Bender et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: Currently, an increasing number of algorithms for biosignal classification is developed, with deep neural networks (DNNs) accounting for a significant percentage [1]. Contrary to traditional signal processing methods using handcrafted features, DNNs provide a data-driven approach, learning relevant features from large amounts of training data. However, understanding the reasoning of these black-box models is a challenge: Despite a known network architecture, the sheer number of trained parameters makes it impossible to associate DNN classifications with physiological interpretation. Thus, their application is especially challenging in clinical settings [2].

Previously [3], we applied Ribeiro et al.‘s pre-trained DNN for 12-lead ECG classification [4] to local clinical recordings of either left bundle branch block (LBBB) or atrial fibrillation (AF). We reproduced the reported performance using our data but observed some false classifications, raising the question of what led the DNN to its prediction. In this work, we evaluate the feasibility of a state-of-the-art attribution method (Integrated Gradients, IG [5]) for explaining the DNN‘s classifications.?????

Methods: We applied the pre-trained model by Ribeiro et al. [6] to the CPSC2018 dataset with 6,877 signals of varying lengths (6-60s) [7]. The model assigns six probabilities for ECG abnormalities to each signal, including LBBB and AF. After changing the activation of the last layer to linear, we applied IG from iNNvestigate [8] to the first 200 signals of each AF, LBBB and sinus rhythm (SR). This method assigns either i) no, ii) a positive, or iii) a negative relevance to each sample of a classified signal allowing to detect clusters explaining the DNN’s decision. Afterwards, we summed relevances for each signal w.r.t. lead and label and depicted the aggregated results for all signals as boxplots.????

Results: Analyzing model results for AF classification, summed relevances showed medians of 0.64,-0.47 and ranges of [-1.61,4.01] and [-5.92,4.72] for AF and SR, respectively. For LBBB classification, medians were 0.41,-1.01 and ranges were [-3.11,6.53] and [-3.72,8.85] for LBBB and SR, respectively. For each lead, the summed relevances were significantly higher (Wilcoxon-Rank-Sum-Test, p-value < 0.01) for both abnormalities compared to SR. Particularly, lead V1 showed the highest mean differences compared to SR: 2.87 for AF and 3.23 for LBBB.

We observed clusters of relevances in the area of QRS complexes during visual inspection of randomly chosen ECGs. For LBBB, IG focused on negative S-waves and prolonged ST segments in lead V1. Occasionally, broad and notched R-waves were also marked relevant. On the contrary, for AF ECGs, the relevant parts were usually R-waves and only in rare instances areas with missing P-waves.

Discussion: Significant differences in relevances between AF/LBBB and SR ECGs suggest that the applied DNN can distinguish these classes well. Moreover, the parts considered relevant by the model seem to match clinical criteria, similar to the results reported by Bodini et al. [9]. These findings need to be verified with more signals and data sources. Furthermore, the clustering of relevances on morphological features (P-QRS-T) could be quantified with annotated data.

Conclusion: Integrated Gradients has the potential to explain DNN classifications of ECGs.

Funding: This project was financially supported by the German Federal Ministry of Education and Research (BMBF) under grant agreement No 16TTP073 11 and the Lower Saxony “Vorab” of the Volkswagen Foundation and the Ministry for Science and Culture of Lower Saxony (grant no. 76211-12-1/21).

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Yoon D, Jang JH, Choi BJ, Kim TY, Han CH. Discovering hidden information in biosignals from patients using artificial intelligence. Korean J Anesthesiol. 2020;73(4):275–84. DOI: 10.4097/kja.19475 Externer Link
2.
Hong S, Zhou Y, Shang J, Xiao C, Sun J. Opportunities and challenges of deep learning methods for electrocardiogram data: A systematic review. Comput Biol Med. 2020;122:103801. DOI: 10.1016/j.compbiomed.2020.103801 Externer Link
3.
Bender T, Seidler T, Bengel P, Sax U, Krefting D. Application of Pre-Trained Deep Learning Models for Clinical ECGs. Stud Health Technol Inform. 2021;283:39–45. DOI: 10.3233/SHTI210539 Externer Link
4.
Ribeiro AH, Ribeiro MH, Paixão GMM, Oliveira DM, Gomes PR, Canazart JA, et al. Automatic diagnosis of the 12-lead ECG using a deep neural network. Nat Commun. 2020;11(1):1760. DOI: 10.1038/s41467-020-15432-4 Externer Link
5.
Sundararajan M, Taly A, Yan Q. Axiomatic Attribution for Deep Networks. In: Proceedings of the 34th International Conference on Machine Learning. PMLR; 2017. (Proceedings of Machine Learning Research). p. 3319–28.
6.
Ribeiro AH, Ribeiro MH, Paixão GM, Oliveira DM, Gomes PR, Canazart JA, et al. Pre-trained deep neural network models for ECG automatic abnormality detection. 2020. DOI: 10.5281/zenodo.3625018 Externer Link
7.
Liu F, Liu C, Zhao L, Zhang X, Wu X, Xu X, et al. An Open Access Database for Evaluating the Algorithms of Electrocardiogram Rhythm and Morphology Abnormality Detection. J Med Imaging Health Inform. 2018; 8(7):1368–73. DOI: 10.1166/jmihi.2018.2442 Externer Link
8.
Alber M, Lapuschkin S, Seegerer P, Hägele M, Schütt KT, Montavon G, et al. iNNvestigate Neural Networks! Journal of Machine Learning Research. 2019;20(93):1–8.
9.
Bodini M, Rivolta MW, Sassi R. Opening the black box: interpretability of machine learning algorithms in electrocardiography. Philos Trans A Math Phys Eng Sci. 2021;379(2212):20200253. DOI: 10.1098/rsta.2020.0253 Externer Link