gms | German Medical Science

62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

17.09. - 21.09.2017, Oldenburg

Migration of intensive care unit (ICU) data into a medical research data mart

Meeting Abstract

Suche in Medline nach

  • Benjamin Baum - Universitätsmedizin Göttingen, Georg-August-Universität, Institut für Medizinische Informatik, Göttingen, Deutschland
  • Christian R Bauer - Universitätsmedizin Göttingen, Georg-August-Universität, Institut für Medizinische Informatik, Göttingen, Deutschland
  • Gunnar Nußbeck - Universitätsmedizin Göttingen, Georg-August-Universität, Geschäftsbereich 3-7 Informationstechnologie, Göttingen, Deutschland
  • Ulrich Sax - Universitätsmedizin Göttingen, Georg-August-Universität, Institut für Medizinische Informatik, Göttingen, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 62. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Oldenburg, 17.-21.09.2017. Düsseldorf: German Medical Science GMS Publishing House; 2017. DocAbstr. 261

doi: 10.3205/17gmds152, urn:nbn:de:0183-17gmds1525

Veröffentlicht: 29. August 2017

© 2017 Baum et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe



Introduction: The secondary use of medical data proves to be more and more important not only for the optimal treatment of patients, but also for answering current scientific questions [1]. Due to the diversity of data types and volume of the data, the analysis and evaluation of secondary data becomes more and more difficult [2].

One secondary data use case at the University Medical Center Göttingen is a retrospective examination [3] of ICU patients from 2015 and 2016. This data consists of roughly 200 million facts, such as demographics, medication and sensory data from more than 7000 patients and is stored in a patient data management system (PDMS). However, the PDMS does not provide cross-patient query, selection, filtering, or quick analysis. Therefore, the data should be available to the clinicians in a research database.

Methods: The open-source medical research database tranSMART allows researchers to select and filter different patient groups based on inclusion and exclusion criteria [4]. The resulting patient set can be analyzed by extensible R scripts. Basic analysis methods such as correlation, boxplots, or heatmaps are supplied. Furthermore, self-developed R scripts can be integrated to allow for a customized analytics workflow.

Since tranSMART does not provide suitable import tools for handling large amounts of data, we developed an ETL (Extraction-Transformation-Loading) workflow using Talend Open Studio.

Results: The medical data was successfully imported into tranSMART with a total runtime of roughly 15 hours. Privacy aspects were taken care of with the local privacy officer and access is only allowed for medical personnel who otherwise can see fully identified patient data in the treatment context.

While using tranSMART with this large dataset, a slow query performance was discovered. Therefore, we looked at the underlying database schema, indexes, and queries.

One cause of the performance issues were large database tables that were searched by string comparisons. Tables were indexed by regular PostgreSQL btree indexes which are generally not used when comparing strings that contain wildcards. Therefore, we changed the indexes to trigrams that support fast comparison of strings. Next, we partitioned several tables to allow for smaller indexes and faster lookups. Another issue was only noticeable when filtering patients based on ‘OR’ queries. Therefore we changed the very slow ‘WHERE IN’ query generated by tranSMART to a concatenated ‘OR’ clause.

These performance improvements utilizing database features and SQL changes decreased the query runtime significantly.

Discussion: TranSMART is a well suited medical research database. After the performance optimizations, tranSMART became a well-accepted solution for querying the ICU data. However, tranSMART has some shortcomings that will be addressed by the tranSMART Foundation in a future version [5]. Firstly, tranSMART does not support the use of modifiers. This results in many different concepts that could be otherwise merged by one single modifier. Secondly, tranSMART does not yet support temporal queries where e.g. some medication is taken before an operation. Further work has to be done in order to avoid treatment-specific micro data marts which are difficult to maintain.

Acknowledgements: This work was supported by the German Federal Ministry of Education and Research (BMBF) within the framework of the research and funding concepts e:Med (01ZX1306C/sysINFLAME) and i:DSem (031L0024A/MyPathSem).

Die Autoren geben an, dass kein Interessenkonflikt besteht.

Die Autoren geben an, dass kein Ethikvotum erforderlich ist.


Cooke CR, Iwashyna TJ. Using existing data to address important clinical questions in critical care. Critical Care Medicine. 2013;41(3):886.
Bui AA, Aberle DR, Kangarloo H. TimeLine: visualizing integrated patient records. IEEE Trans Inf Technol Biomed. 2007;11:462-473. DOI: 10.1109/TITB.2006.884365 Externer Link
Bauer C, Umbach N, Baum B, Buckow K, Franke T, Grütz R, et al. Architecture of a Biomedical Informatics Research Data Management Pipeline. Studies in health technology and informatics. 2016;228:262.
Canuel V, Rance B, Avillach P, Degoulet P, Burgun A. Translational research platforms integrating clinical and omics data: a review of publicly available solutions. Briefings in bioinformatics. 2015;16(2):280–290.
The i2b2 Foundation and the tranSMART Foundation Announce Their Intention to Merge. (last visited April 24, 2017) Externer Link