gms | German Medical Science

GMDS 2015: 60. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

06.09. - 09.09.2015, Krefeld

Extracting Activities and Observations from Surgical Reports

Meeting Abstract

Search Medline for

  • Kerstin Denecke - Universität Leipzig, Leipzig, Deutschland

GMDS 2015. 60. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Krefeld, 06.-09.09.2015. Düsseldorf: German Medical Science GMS Publishing House; 2015. DocAbstr. 042

doi: 10.3205/15gmds022, urn:nbn:de:0183-15gmds0222

Published: August 27, 2015

© 2015 Denecke.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction and Objective: A surgical workflow or process is defined as "a set of one or more linked procedures or activities that collectively realize a surgical objective within the context of an organizational structure defining functional roles and relationships" [1]. Each activity consists of the following perspectives: participant, surgical instrument, anatomical structure, and start and stop time of the activity. Collecting descriptions of workflows and associated activities of different interventions allow to determine and study clinically relevant deviations. A typical mean to collect surgical workflows is the registration of activities through sensors. However, sensors cannot recognize observations of the surgeon. We suggest an approach to extract relevant information on activities and observations from surgical reports.

Most work on relation extraction from clinical text concentrated on extracting treatment relationships, e.g. on extraction of semantic relations: cure, prevent and side effect between diseases and treatments mentioned in biomedical texts [2] or extracting protein-protein interactions. SemRep is a program that extracts semantic predications (subject-relation-object triples) from biomedical free text [3]. Further, relation extraction was one task within the 2010 i2B2/VBA NLP Challenge (https://www.i2b2.org/NLP/, accessed: 05.07.2014). The challenge concentrated on extraction from discharge summaries and on the recognition of three types of relations: treatment-problem, test-problem, problem-problem. Surgical reports remained so far unconsidered in the field of medical language processing.

Material and Methods: Surgical reports summarise observations and actions performed in the context of patient treatment in general and of a surgery in particular. They bear knowledge that can be valuable for example for workflow analysis, comparison or quality assessment. In contrast to other clinical documents, e.g. finding reports, surgical reports are characterized by complex sentences with a large variety of vocabulary. Knowledge on surgical activities is described in surgical reports by means of semantic relations among medical concepts. A (semantic) relation is defined as a predicate ranging over two arguments, where an argument represents concepts, objects or people in the real world and the relation predicate describes the association or interaction that holds between the concepts represented by the arguments. Consider for example the sentence ”Frozen section exam showed a fibroadenoma with some proliferative hyperplasia within the fibroadenoma”. It contains several relations, e.g. the relation showed: “Frozen section exam - showed - a fibroadenoma”). Another relation that could be labelled “accompanied by” exist between “fibroadenoma” and “some proliferative hyperplasia”. Relationships semantically connect the entities and thus it is crucial to identify them when analysing texts in order to understand the content. Only with extracted relations, a deeper text understanding (e.g., recognising the who, when, where of a medical / clinical event and the temporal and causal relationships between events) is possible. We introduce a concept that combines concept and relation extraction with classification of relations for automatically collecting and interpreting information on activities and observations described in surgical reports.

Results: In order to extract surgical workflow activities and observations from texts, we developed a concept comprising three steps:

1.
Each sentence of a document is analysed using a relation extraction method. Since possible relations and the language used in these texts cannot be specified in advance, we decided to apply open information extraction to extract the relations. More specifically, we use ReVerb [4] for our experiments. ReVerb is designed for Web-scale information extraction, where the target relations cannot be specified in advance and speed is important [4]. We selected ReVerb for our experiments since an improved quality over other open information extraction systems was reported. ReVerb takes as input a sentence tagged with part of speech information and chunks of nominal phrases and returns relation triples consisting of [argument 1, relation phrase, argument 2]. Relation phrases are normalized by applying stemming.
2.
Each argument is mapped to concepts of the UMLS using DragonToolkit. The DragonToolkit takes as input a document or sentence and identifies medical entities that are part of the UMLS. Each concept of the UMLS belongs to one of the 133 semantic types (e.g. Body Part) and to one of the 15 main groups (e.g. Anatomy).
3.
Extracted relations, i.e. arguments mapped to UMLS concepts and normalized relation type, are finally classified as activity or observation. The classification is realised using machine learning. For each relation, a feature vector is generated using semantic types of concept that represent the arguments and the relation type. For classification we applied several machine learning algorithms implemented in the WEKA library, including Support Vector Machines, Naive Bayes and Logistic classifiers.

We applied the relation extraction approach to a set of 312 sentences randomly collected from 24 surgical reports. They cover different surgical fields (e.g. Mammoplasty, sterilization, nasal surgery) and were originally provided as coding exercise.The relation detection identified 252 of them. This means that a recall of 80% was achieved. A reason for missing extractions might be the extraction method of ReVerb. It requires a verb in the sentence - when the verb is missing or no verb is determined, the algorithm does not extract anything. Further, the syntactic constraint requires that the second argument starts or ends with preposition. 86.1% of the detected relations were correct and 13.9% of the extracted relations were incorrect. Errors in relation extraction occurred for the following reasons: (1) complexity of sentences, (2) ambiguity of word classes, and (3) sentence boundary detection.

By applying a Naive Bayes Multinominal Classifier on this feature set, an accuracy of 85.2% is achieved which is significantly higher than the accuracy achieved for the ground truth with a bag of words feature set (80.9% with the same classifier). Classification with a Logistic Classifer and LibSVM resulted in an accuracy of 78.1% and 70.7%, respectively.

Discussion: This paper addressed the task of extracting relations from surgical reports and of classifying those relations as activities and observations. We applied open information extraction to extract relations from the reports. The results show that this approach can be successfully applied to extract relation arguments and types. The extracted relation information is also well suited to distinguish observation relations from activity relations.


References

1.
Liebmann P, Bohn S and Neumuth T. Design and validation of a robust surgical workflow management system. In: 2nd Workshop on Modeling and Monitoring of Computer Assisted Interventions. Toronto: 2011. p. 20-28.
2.
Frunza O, Inkpen D. Extraction of disease-treatment semantic relationsfrom biomedical sentences. In: Proceedings of the 2010 Workshop on Biomedical Natural Language Processing, BioNLP '10; Stroudsburg, PA, USA: Association for Computational Linguistics; 20101. p. 91-98.
3.
Rindflesch TC, Fiszman M. The interaction of domain knowledge and linguistic structure in natural language processing: interpreting hypernymic propositions in biomedical text. Journal of Biomedical Informatics. 2003;36(6):462-477.
4.
Etzioni O, Fader A, Christensen J, Soderland S, Mausam M. Open information extraction: the second generation. In: Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume One. AAAI Press; 2011. p. 3-10.