gms | German Medical Science

German Congress of Orthopaedics and Traumatology (DKOU 2023)

24. - 27.10.2023, Berlin

Expert accuracy and reliability of artificial intelligence for fully automated analysis of the mechanical alignment of the lower extremity – results from a multi-centric validation study

Meeting Abstract

  • presenting/speaker Marco-Christopher Rupp - Steadman Philippon Research Institute, The Steadman Clinic, Vail, United States; Abteilung für Sportorthopädie, TU München, München, Germany
  • Felix J. Lindner - Sektion Sportorthopädie, Technische Universität München, Klinikum rechts der Isar, München, Germany
  • Yannick Ehmann - Sektion Sportorthopädie, Technische Universität München, Klinikum rechts der Isar, München, Germany
  • Claudio E. von Schacky - Institut für Diagnostische und Interventionelle Radiologie, Klinikum rechts der Isar, Technische Universität München, München, Germany
  • Matthias Jung - Institut für Radiologie, Universitätsklinikum Freiburg, Albert Ludwig Universität Freiburg, Freiburg, Germany
  • Jonas Pogorzelski - Sektion Sportorthopädie, Technische Universität München, Klinikum rechts der Isar, München, Germany
  • Matthias Feucht - Orthopaedische Klinik Paulinenhilfe, Diakonie-Klinikum Stuttgart, Stuttgart, Germany; Albert Ludwig Universität Freiburg, Freiburg, Germany
  • Sebastian Siebenlist - Sektion Sportorthopädie, Technische Universität München, Klinikum rechts der Isar, München, Germany
  • Rainer Burgkart - Abteilung für Orthopädie, Technische Universität München, Klinikum rechts der Isar, München, Germany
  • Nikolas Wilhelm - Abteilung für Orthopädie, Klinikum rechts der Isar, Munich School of Machine Intelligence, Technische Universität München, München, Germany

Deutscher Kongress für Orthopädie und Unfallchirurgie (DKOU 2023). Berlin, 24.-27.10.2023. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocAB62-3041

doi: 10.3205/23dkou307, urn:nbn:de:0183-23dkou3076

Published: October 23, 2023

© 2023 Rupp et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Objectives: While in the public perception, artificial intelligence (AI) is expected to change healthcare, to date few AI applications exist, that truly live up to the promise and relieve orthopedic surgeons (OS) of complex and repetitive clinical workflows. An analysis of the leg alignment is a time consuming process performed in high quantity in clinical practice. The purpose of this study was to develop a DL model for an automated assessment of the leg alignment on anterior posterior (a.p.) long leg radiographs (LLR) and compare the performance to OS in a multicentric validation study.

Methods: In an industry-independent development project, a high performance AI software capable of automatically analyzing leg alignment on a.p. LLR radiographs without any user input was developed. A radiographic dataset of 458 patients from a single institution with annotations of all relevant landmarks by orthopedic experts was utilized for training (n=399) and validation (n=59). A state of the art, high performance deep learning network, composed of 12 expert networks based on a COCO pretrained Mask-R CNN-ResNeXt-101 was built, capable of automatic measurements of the mechanical lateral proximal femur angle (mLPFA), lateral distal femur angle (mLDFA), medial proximal tibia angle (mMPTA), lateral distal talus angle (mLDTA), femorotibial angle (mFA-mTA), joint line convergence angle (JLCA), and anatomic mechanical femur angle (AMA) on a.p. LLRs. On an internal(n=136) as well as an external test dataset (n=143) of an independent institution, accuracy, reliability, and processing time of the AI were compared with the performance of three expert OS.

Results and conclusion: For the AI, the accuracy of the measurements ranged from 0.16°±0.14° (mFTA) to 1.06°±1.3° (mLPFA). In comparison, human expert accuracy ranged from 0.13°±0.14° (mFTA) to 1.72° ± 1.96° (mLPFA). For the AI, the interreader reliability (IRR) of the measurements was moderate (0.73, JCLA) to excellent (1.0, mFTA). Human expert IRR was moderate (0.79 JCLA) to excellent (1.0, mFTA). Measurements within a clinically acceptable safety margin were accomplished in 87.0% (mLPFA) – 100% (mFTA) of the cases by the AI in comparison to human expert OS, for whom clinically acceptable safety ranged between 13.6% (mLPFA) to 100% (mFTA). Intrarater reliability was 1.0 for the AI, while it ranged from 0.83 – 1.0 for expert OS. The mean processing time of expert OS for a fully comprehensive analysis of the alignment using manual software ranged from 101±7 to 105±7 seconds, while it was 22±0.6 seconds for the DL model (p=0.01).

The developed DL model allowed for a comprehensive analysis of leg alignment on a.p. LLR with precision, reliability, and robustness comparable to that of OS, not failing on a single image during validation. By significantly and substantially outperforming human raters in terms of processing time for assessment as well as repeated measurement reliability, the AI developed yields potential to accelerate and enhance clinical practice.