Artikel
Automating Crohn’s Disease Phenotyping: Comparing Natural Language Processing Approaches
Suche in Medline nach
Autoren
Veröffentlicht: | 6. September 2024 |
---|
Gliederung
Text
Introduction: The Montreal Classification (MC) captures the heterogeneity of Crohn's disease (CD) [1]. While the MC is an important tool for characterizing CD, its ascertainment for real-world studies requires manual chart review that is labor-intensive with limited scalability. We therefore used Electronic Health Records (EHR) for automated MC phenotyping, and, using this information, to identify CD incident cases.
Methods: We defined CD patients (n=7,624) from the Mount Sinai Health System EHR based on CD diagnosis codes and medications [2]. We then developed a pipeline for automated extraction of MC disease behavior and age at diagnosis from EHR narrative texts, using a rule-based approach based on the spaCy framework [3], and in-context learning using GPT-4. Two reviewers labeled a randomly selected clinical notes (n=150) and radiology reports (n=50) at sentence-level (n=15,390). The algorithms were evaluated for recall, precision, and F1-Scores. For each CD patient, the first coded CD diagnosis was considered as disease index date. We compared the index date with the prior encounter history and the extracted age at diagnosis to identify incident cases. To confirm the validity of the extracted incident case cohort and index date, we conducted manual chart review of 50 randomly selected cases and controls of the resulting cohorts.
Results: For the labeled data, the Cohen's kappa inter-annotator agreement was 0.84. For the detection of a stricturing or penetrating disease complication using clinical notes, the rule-based and GPT-4-based approaches yielded high recall, precision and F1-score values (rule-based: 1.00, 0.84, and 0.92; GPT-4: 0.95, 0.86, and 0.90, respectively), with similar performance between the two approaches. Perianal disease was extracted with a recall of 1.00, precision of 0.86, and F1-score of 0.93 using the rule-based approach, and 0.92 using GPT-4. For age at diagnosis, with a recall of 1.00, precision of 0.87, and F1-score of 0.93, GPT-4 performed slightly better than the rule-based approach with a recall of 0.81, precision of 0.88, and F1-score of 0.85. Upon achieving good performance, we were able to extract the age at diagnosis from the clinical text of 4,344 Crohn's disease patients of the Mount Sinai Health System and compared this information with the first coded patient encounters and CD diagnosis in the patients’ EHR, resulting in a sub-cohort of 229 Crohn's disease incident cases. With our phenotyping algorithm, we were able to identify cases and controls with high accuracy (0.96 and 0.95, respectively). In 83% of cases, the automatically identified first date of CD diagnosis was at most 180 days before the reviewed first date of diagnosis.
Discussion and conclusion: We demonstrate the feasibility of automatically extracting CD diagnosis and MC from clinical texts with good precision using EHR data. This approach can facilitate data extraction for real-world research at large scale and demonstrated utility in identifying newly diagnosed patients with CD. The evaluated approaches were based on rules and a general large language model, GPT-4. Performance of domain-specific Large Language Models such as MEDITRON [4] or BioMistral [5] may be of interest.
Competing interests: RCU has served as a consultant and/or advisory board member for AbbVie, Bristol Myers Squibb, Celltrion, Inotrem, Lilly, Janssen, Pfizer, Roivant, Takeda. The remaining authors declare no conflict of interest.
The authors declare that a positive ethics committee vote has been obtained.
The contribution has already been published: S. Ibing, L. Schmidt, F. Borchert, J. Hugo, C. Benson, A. Marshall, J. Peraza, B.Y. Renard, J.H. Cho, E.P. Böttinger, R.C. Ungaro: Automating Crohn’s disease phenotyping: a natural language processing approach. Digestive Disease Week 2024, Gastroenterology.
References
- 1.
- Silverberg MS, Satsangi J, Ahmad T, Arnott IDR, Bernstein CN, Brant SR, et al. Toward an integrated clinical, molecular and serological classification of inflammatory bowel disease: report of a Working Party of the 2005 Montreal World Congress of Gastroenterology. Canadian journal of gastroenterology. 2005;19 Suppl A:5A-36A.
- 2.
- Ibing S, Cho JH, Böttinger EP, Ungaro RC. Second-Line Biologic Therapy Following Tumor Necrosis Factor Antagonist Failure: A Real-World Propensity Score-Weighted Analysis. Clinical Gastroenterology and Hepatology. 2023 Sep 1;21(10):2629–38.
- 3.
- Schmidt L, Ibing S, Borchert F, Hugo J, Marshall A, Peraza J, et al. Extraction of Crohn’s Disease Clinical Phenotypes from Clinical Text Using Natural Language Processing [Preprint]. medRxiv. 2023. DOI: 10.1101/2023.10.16.23297099
- 4.
- Chen Z, Cano AH, Romanou A, Bonnet A, Matoba K, Salvi F, et al. MEDITRON-70B: Scaling Medical Pretraining for Large Language Models [Preprint]. arXiv. 2023. DOI: 10.48550/arXiv.2311.16079
- 5.
- Labrak Y, Bazoge A, Morin E, Gourraud PA, Rouvier M, Dufour R. BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains [Preprint]. arXiv. 2024. DOI: 10.48550/arXiv.2402.10373