gms | German Medical Science

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH)

08.09. - 13.09.2024, Dresden

Real-Time Process Monitoring Hospital Data

Meeting Abstract

Search Medline for

  • Md Mostafa Kamal - Medical Data Integration Center (MeDIC), Institute for Biomedical Informatics, University of Cologne, Faculty of Medicine and University Hospital Cologne, Cologne, Germany
  • Ekaterina Kutafina
  • Oya Beyan - Universität zu Köln, Köln, Germany

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH). Dresden, 08.-13.09.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 945

doi: 10.3205/24gmds046, urn:nbn:de:0183-24gmds0467

Published: September 6, 2024

© 2024 Kamal et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: Hospital Data Integration Centers face a formidable workload of manual analysis of the extensive logs generated by ETL (Extract, Transform, Load) processes to ensure data quality. The process is time-consuming and prone to errors, underscoring the urgent need for more efficient automated solutions.

State of the art: Previous studies proposed various solutions for ETL process monitoring. Ai et al. developed a CEP (complex event processing) engine for rapid and efficient data processing in active distribution networks. Seenivasan highlighted best practices for ETL processes, focusing on data quality checks and performance optimization. Trajkovska et al. emphasized the automation benefits of tools like SnapLogic in data integration. A. and M. introduced a real-time data warehouse framework leveraging Business Activity Monitoring (BAM) for up-to-date data processing.

Despite these advances, there remains a need for specialized solutions tailored to the healthcare domain. This approach introduces a real-time automated monitoring system to address this challenge.

Concept: This proposed framework focuses on improving data ingestion within the Medical Data Integration Center (MeDIC) at the University Hospital Cologne by leveraging the ELK Stack (Elasticsearch, Logstash, Kibana) to automate ETL log analysis and identify data quality issues. The architecture collects data from various primary systems, processes it through ETL mapping engines, and uses Filebeat to transfer logs for real-time analysis and visualization. Elasticsearch facilitates rapid searches and complex queries, Logstash processes and enriches data, and Kibana creates interactive visualizations. The log processing workflow involves identifying major events, configuring the ELK stack, and iteratively optimizing the log parsing algorithm based on team feedback. Continuous updates and refinements ensure that the dashboards are accurate and informative.

Implementation: The implementation of the real-time dashboard for MeDIC's ETL processes includes several visualizations that provide crucial insights into data ingestion quality. The dashboard features a line chart for success and error rates, a summary of processed and failed files, a table of success and error counts per file type, and error counts based on past experiences. Additionally, visualizations can be downloaded as CSV files.

The automated ELK architecture has been monitoring ETL processes for six months, processing around 2.46 million files, with 2.37 million successfully processed and 91.39 thousand failed. Errors accounted for 62.89 thousand failed files, with known issues including missing patient or composition resources and validation failures. This system reduces manual work, provides instant insights, and ensures that only high-quality data enters the ingestion process by allowing immediate shutdowns of problematic ETL processes until fixes are implemented.

Lessons learned: The developed monitoring solution integrates real-time data analysis and visualization specifically for medical data ingestion processes, offering stakeholders valuable insights for informed decision-making and timely interventions. The system is transparent, scalable, easy to integrate with additional ETL engines, and reduces manual intervention, enhancing the reliability, accuracy, and speed of data integration. While setting up the ELK Stack requires initial investment and technical expertise, the benefits of efficiency, data integrity, and stakeholder engagement are substantial.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.