gms | German Medical Science

62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

17.09. - 21.09.2017, Oldenburg

Information extraction from German clinical documents in neurodegenerative diseases

Meeting Abstract

  • Sebastian Schaaf - Fraunhofer Institute for Scientific Computing and Algorithms SCAI, Sankt Augustin, Deutschland
  • Mischa Übachs - University Hospital Bonn (UKB), Bonn, Deutschland
  • Lisa Langnickel - Fraunhofer Institute for Scientific Computing and Algorithms SCAI, Sankt Augustin, Deutschland
  • Philipp Köppen - University Hospital Bonn (UKB), Bonn, Deutschland
  • Thomas Klockgether - University Hospital Bonn (UKB), Bonn, Deutschland; German Center for Neurodegenerative Diseases (DZNE), Bonn, Deutschland
  • Juliane Fluck - Fraunhofer Institute for Scientific Computing and Algorithms SCAI, Sankt Augustin, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Oldenburg, 17.-21.09.2017. Düsseldorf: German Medical Science GMS Publishing House; 2017. DocAbstr. 268

doi: 10.3205/17gmds161, urn:nbn:de:0183-17gmds1612

Veröffentlicht: 29. August 2017

© 2017 Schaaf et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Background: In hospitals, despite the existence of hospital information systems, a high amount of valuable patient information is documented in full text only. From a data mining point of view, it cannot be exploited until it is transformed into standardized, machine-readable, structured information. We adapted a generic text mining environment implemented as virtual machines to be easily set up as part of a data integration architecture within the University Hospital Bonn (UKB). As application, we focus on structuring data in the area of neurodegenerative diseases (NDD).

Aim of the study: The IDSN consortium (Integrative Datensemantik für die Neurodegenerative Forschung) aims for the integration of NDD-related data from diverse sources, including clinical routine documents by text mining.

Here, we present two information extraction pipelines aiming to retrieve defined items from German letters of discharge and neuropsychological test reports (NPTs). End point of the extraction processes is a mapping into the DZNE-DESCRIBE database scheme which is designed to collect anamnestic data via structured interviews.

Proposed methods: TM Infrastructure: The text mining service is deployed as a set of virtual machines (VMs) in the clinic infrastructure. In short, a Broker VM is used to queue messages, which are then forwarded to the specific Worker VMs for processing. In the NDD use case, two Worker VMs, have been set up. In the first VM, tables with cognitive test battery information are extracted from the NPTs and directly transformed into the DESCRIBE model. The second VM extracts five classes of common disturbances in dementia and temporal information. It contains a generic NLP processes involving segmentation (tokens, word decomposer, sentences, paragraphs), stemming and assertion recognition such as negation. For named entity recognition, terminology was developed based on the training corpus and expert knowledge. For information extraction, rules are written in RUTA syntax. Document readers and an ODM writer connect the workflow to the hospital IT systems.

Training corpus and annotation: For the NPTs with a very structured format, only 10 records from different creation dates were used as initial corpus. For generation of training data from discharge letters, the documents are anonymised and represented to the user in a web-based annotation interface (BRAT). In BRAT, the classes to be extracted are annotated by the user. Furthermore, anonymization results could be inspected and corrected.

Points for discussion: The effort and grade of success of information extraction approaches is highly use case-dependent. For information which is already available in a structured way in text documents (such as our NPTs), the investment in setting up extraction pipelines is very low while gaining high quality structured information (e.g. longitudinal patient data from cognitive test batteries). In other cases, were anamnesis information and temporal data has to be extracted from discharge letters, efforts for setting up a corresponding workflow are higher. Main investments are the annotation of training data, terminology enrichment and adaptation/training of the extraction methods.

Taken together, our semantic information integration approach narrows the gap between unstructured resources and its automated use, finally providing longitudinal data on dementia patients.



Die Autoren geben an, dass kein Interessenkonflikt besteht.

Die Autoren geben an, dass kein Ethikvotum erforderlich ist.


References

1.
Fluck J, Senger P, Ziegler W, Claus S, Schwichtenberg H. The cloud4health project: Secondary Use of Clinical Data with Secure Cloud-based Text Mining Services. In: Griebel M, Schüller A, Schweitzer MA, eds. Scientific Computing and Algorithms in Industrial Simulations - Projects and Products of Fraunhofer SCAI. Springer Series.
2.
Starlinger J, Kittner M, Blankenstein O, Leser U. How to improve information extraction from German medical records. Information Technology. 2016;58. DOI: 10.1515/itit-2016-0027 Externer Link
3.
Faßbender T, Riede C, Daumke P, Honrado A, Kreuzthaler M, Lopez-Garcia P, Schulz S, van Mulligen E, Kors J, van Haagen H, Gonna H, Wang X, Behr E. SEMCARE – Semantic Data Platform for Healthcare. In: GMDS 2015. 60. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Krefeld, 06.-09.09.2015. Düsseldorf: German Medical Science GMS Publishing House; 2015. DocAbstr. 118. DOI: 10.3205/15gmds024 Externer Link
4.
Fluck J, Senger P, Griebel L, Leb I. Extraction of TNM Codification from German Pathology Reports. In: GMDS 2014. 59. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Göttingen, 07.-10.09.2014. Düsseldorf: German Medical Science GMS Publishing House; 2014. DocAbstr. 369. DOI: 10.3205/14gmds064 Externer Link
5.
Senger P, Klenner A, Fluck J. A Business Logic System for Mining German Patient Records. In: GMDS 2013. 58. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Lübeck, 01.-05.09.2013. Düsseldorf: German Medical Science GMS Publishing House; 2013. DocAbstr.248. DOI: 10.3205/13gmds056 Externer Link