gms | German Medical Science

63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

02. - 06.09.2018, Osnabrück

Extracting single parameters versus the diagnosis of complex syndrome from discharge letters

Meeting Abstract

  • Mathias Kaspar - Comprehensive Heart Failure Center and Department of Internal Medicine I, Würzburg University Hospital, Würzburg, Deutschland
  • Georg Fette - Comprehensive Heart Failure Center and Department of Internal Medicine I, Würzburg University Hospital, Würzburg, Deutschland
  • Gülmisal Güder - Comprehensive Heart Failure Center and Department of Internal Medicine I, Würzburg University Hospital, Würzburg, Deutschland
  • Lea Seidlmayer - Comprehensive Heart Failure Center and Department of Internal Medicine I, Würzburg University Hospital, Würzburg, Deutschland
  • Maximilian Ertl - Service Center Medical Informatics, Würzburg University Hospital, Würzburg, Deutschland
  • Georg Dietrich - Chair of Computer Science VI, University of Würzburg, Würzburg, Deutschland
  • Helmut Greger - Service Center Medical Informatics, Würzburg University Hospital, Würzburg, Deutschland
  • Frank Puppe - Chair of Computer Science VI, University of Würzburg, Würzburg, Deutschland
  • Stefan Störk - Comprehensive Heart Failure Center and Department of Internal Medicine I, Würzburg University Hospital, Würzburg, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Osnabrück, 02.-06.09.2018. Düsseldorf: German Medical Science GMS Publishing House; 2018. DocAbstr. 84

doi: 10.3205/18gmds130, urn:nbn:de:0183-18gmds1303

Published: August 27, 2018

© 2018 Kaspar et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: A large part of routine medical documentation is most interesting for a secondary use but still often primarily provided as free text. Besides the interest in providing more primarily structured documentation, there also efforts in extracting information from free texts in order to unlock the already existing data. These efforts concentrate on extracting single parameters that are more or less clearly defined within the text, as e.g. single echocardiographic parameters [1], [2]. Such approaches usually fail if not an individual parameter needs to be detected, but a complex syndrome, e.g. heart failure (HF). The definition of HF depends on the very specific combination of different parameters, which are only selectively provided in discharge letters. Here we describe our effort to extract the HF diagnosis by utilizing our previous echocardiographic information with high accuracy scores extraction and other strategies, e.g. searches in discharge letters [3].

Methods: We utilized the clinical data warehouse (DWH) deployed at our hospital (including data from multiple domains and information systems and an ad-hoc information extraction feature [4]) to select and extract data from 1042 inpatients of the department of internal medicine. A cardiologist with long-standing experience provided a gold standard definition of the HF diagnosis. Eighteen queries were defined using the DWH query interface, with any hit meaning the existence of HF. Each query represents a specific medical concept relevant for the HF diagnosis that was documented as structured data (3 queries, e.g. all HF-related ICD diagnoses), data from information extraction of the echocardiographic report (2 queries, e.g. Left ventricular ejection fraction ≤45) and the existence of texts within the discharge letter including synonyms (13 queries, e.g. existence of “reduced left ventricular function”). An algorithm was defined via permutation testing on all queries to optimize for a high F1-score. Another algorithm was manually defined using pure ICD codes. These scores are compared to parameter extractions provided earlier [1].

Results: 222 patients of the 1042 were labeled by the cardiologist to have heart failure. The ICD-only definition resulted in a precision of 94%, a recall of 50% and an F1-score of 65%. The F1-score optimized algorithm resulted in a precision of 89%, a recall of 84% and a F1-score of 86%. The previous extraction of the 29 single parameters from echocardiography resulted in micro averages of 99% (precision), 99% (recall), and 99% (F1-score) [1].

Discussion: While it is rather simple to extract parameters with high F1-scores from texts that are steady and uniformly documented, it is much harder to use a direct extraction approach to get high F1-scores for medical concepts that are less clearly documented, as shown in this analysis. Literature about HF-identification algorithms usually shows even lower F1-scores than our 86% (e.g. 82% in [5]). One approach that we consider for the future are machine-learning techniques, which might improve F1-scores of medical concepts with a rather fuzzy definition within discharge letters.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Toepfer M, Corovic H, Fette G, Klügl P, Störk S, Puppe F. Fine-grained information extraction from German transthoracic echocardiography reports. BMC Med Inform Decis Mak. 2015;15:91.
2.
Meystre SM, Kim Y, Gobbel GT, Matheny ME, Redd A, Bray BE, Garvin JH. Congestive heart failure information extraction framework for automated treatment performance measures assessment. J Am Med Inform Assoc. 2017;24(e1):e40-6.
3.
Saczynski JS, Andrade SE, Harrold LR, Tjia J, Cutrona SL, Dodd KS, Goldberg RJ, Gurwitz JH. A Systematic Review of Validated Methods for Identifying Heart Failure Using Administrative Data. Pharmacoepidemiology and Drug Safety. 2012;21:129–40. DOI: 10.1002/pds.2313 External link
4.
Dietrich G, Krebs J, Fette G, Ertl M, Kaspar M, Störk S, Puppe F. Ad Hoc Information Extraction for Clinical Data Warehouses. Methods Inf Med. 2018 May;57(1):e22-e29. DOI: 10.3414/ME17-02-0010 External link
5.
Schultz SE, Rothwell DM, Chen Z, Tu K. Identifying cases of congestive heart failure from administrative data: a validation study using primary care patient records. Chronic Dis Inj Can. 2013;33(3):160–6.