gms | German Medical Science

66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

26. - 30.09.2021, online

SAVIE-NLP: Semi-automated validation pipeline for NLP-based information extraction from openEHR repositories

Meeting Abstract

  • Nektarios Ladas - Peter L. Reichertz Institute for Medical Informatics, TU Braunschweig and Hannover Medical School, Hannover, Germany
  • Stefan Franz - Peter L. Reichertz Institute for Medical Informatics, TU Braunschweig and Hannover Medical School, Hannover, Germany
  • Natalia Strauch - Peter L. Reichertz Institute for Medical Informatics, TU Braunschweig and Hannover Medical School, Hannover, Germany
  • Alina Rehberg - Department of Gastroenterology, Hepatology and Endocrinology, Hannover Medical School, Hannover, Germany
  • Mazyar Behnahad - Peter L. Reichertz Institute for Medical Informatics, TU Braunschweig and Hannover Medical School, Hannover, Germany
  • Michael Marschollek - Peter L. Reichertz Institute for Medical Informatics, TU Braunschweig and Hannover Medical School, Hannover, Germany
  • Matthias Gietzelt - Peter L. Reichertz Institute for Medical Informatics, TU Braunschweig and Hannover Medical School, Hannover, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 26.-30.09.2021. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 16

doi: 10.3205/21gmds042, urn:nbn:de:0183-21gmds0428

Veröffentlicht: 24. September 2021

© 2021 Ladas et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: The implementation of natural language processing (NLP) methods for information extraction (IE) pipelines of semi-structured and unstructured medical documents in health information systems (HIS) is a process that helps achieve an integrable and interoperable health information exchange (HIE) platform [1]. To guarantee the IE data reliability, the mined annotated results are manually compared with the original medical texts [2].

State of the art: In their majority, most studies that apply NLP methods for IE of unstructured medical documents are using pre-defined validations sets that domain experts have manually created. Accuracy is measured through precision, recall and F1 score [3], [4].

Concept: For validating the extracted medical data values, a web-based solution is suitable that visualizes the results of the NLP pipeline from openEHR clinical data repositories through an operating system independent user-friendly interface.

Implementation: After updating the composition through the user interface, annotated errors are then projected on a SQLite relational database. The developer can either view through a database tool or a Representational State Transfer-Application Programming Interface (REST-API) web service the results of the validated document.

The database-driven approach for storing inaccurately and accurately annotated items allows adding version controlling in the results of the NLP pipelines for fine-tuning and immediate error recognition and correction in production environments.

Lessons learned (Discussion): Validating data from openEHR repositories that NLP pipelines process and extract from medical texts require not only to statistically measure the performance in terms of specificity and sensitivity but also to constantly test and manually validate the extracted results until the data are fully accurate.

Conclusion: SAVIE-NLP provides a web interface that allows the dynamic creation of manual validation procedures in IE pipelines by closing also the gap of organizing communication processes between domain experts, testers, and application developers for the quality control in IE pipelines.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Haarbrandt B, Schreiweis B, Rey S, Sax U, Scheithauer S, Rienhoff O, Knaup-Gregori P, Bavendiek U, Dieterich C, Brors B, Kraus I, Thoms CM, Jäger D, Ellenrieder V, Bergh B, Yahyapour R, Eils R, Consortium H, Marschollek M. HiGHmed - An Open Platform Approach to Enhance Care and Research across Institutional Boundaries. Methods Inf Med. 2018 Jul;57(S 01):e66-e81. DOI: 10.3414/ME18-02-0002 Externer Link
2.
Singh S. Natural Language Processing for Information Extraction [Preprint]. arXiv. 2018 Jul. arXiv:1807.02383[cs]. Available from: http://arxiv.org/abs/1807.02383 Externer Link
3.
Beale T, Heard S. What is openEHR?. 2021 [cited 18 February 2021]. Available from: https://www.openehr.org/about/what_is_openehr Externer Link
4.
Luo L, Li L, Hu J, Wang X, Hou B, Zhang T, Zhao L. A hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system. BMC Medical Informatics and Decision Making. 2016;16(1):114. DOI: 10.1186/s12911-016-0357-5 Externer Link
5.
Fonferko-Shadrach B, Lacey AS, Roberts A, et al. Using natural language processing to extract structured epilepsy data from unstructured clinic letters: development and validation of the ExECT (extraction of epilepsy clinical text) system. BMJ Open. 2019;9:e023232. DOI: 10.1136/bmjopen-2018-023232. Externer Link
6.
Helping medical teams improve patient care Better care [Internet]. 2021 [cited 18 February 2021]. Available from: https://www.better.care/ Externer Link
7.
Divyabharathi DN, Cholli NG. A Review on Identity and Access Management Server (KeyCloak). International Journal of Electrical and Power Engineering. 2020;14:17-22.