gms | German Medical Science

64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

08. - 11.09.2019, Dortmund

Technical architecture of a tool for interoperable data characterization

Meeting Abstract

Search Medline for

  • Erik Tute - Peter L. Reichertz Institut für Medizinische Informatik der TU Braunschweig und der Medizinischen Hochschule Hannover, Hannover, Germany
  • Michael Marschollek - Peter L. Reichertz Institut für Medizinische Informatik der TU Braunschweig und der Medizinischen Hochschule Hannover, Hannover, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Dortmund, 08.-11.09.2019. Düsseldorf: German Medical Science GMS Publishing House; 2019. DocAbstr. 175

doi: 10.3205/19gmds088, urn:nbn:de:0183-19gmds0886

Published: September 6, 2019

© 2019 Tute et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: Reuse of electronic patient data for medical research is an important research topic (cf. [1]). Data quality (DQ) and lack of knowledge about datasets are common challenges mentioned in the literature. Data characterization, i.e. the calculation of descriptive measures (e.g. simple statistical measures, plausibility checks etc.) without directly including assessments of the results in the calculation process, is a common method to help analysts getting insight into datasets and in finding DQ-issues. Literature on DQ-assessment states the unsolved problem of limited comparability between measurement results. The objective of this contribution is to present the technical architecture of a tool for interoperable data characterization. Interoperability in this case implies two things: First, the tool shall be portable between institutions sharing the standardized technical infrastructure of the HiGHmed [2] consortium. Second, measurement results shall be comparable between organizations using the same (or similarly structured) clinical information models (CIM).

Methods: The architecture uses three features of the openEHR specification [3] to reach interoperability. CIMs (1) provide shared machine-readable definitions of the clinical concepts the data represents, thus enabling deterministic generation of comparable measurement methods (MM) for data characterization. The Archetype Query Language (AQL) (2) and the openEHR REST API specification (3) for openEHR-based data repositories provide the means for CIM-based data retrieval, enabling portability of the tool to all data repositories conforming to the openEHR REST API specification. The generation of MMs based on shared CIMs in combination with deferred aggregation and visualization of measurement results (cf. [4]) enables comparability of results.

Results: Four components constitute the architecture. A client web application for user interaction, a Node.js server application for data processing, an instance of an openEHR REST API compliant data repository and a server side instance of R (statistical computing). The user defines the dataset to retrieve from the repository using the client and AQL. The client forwards the AQL-query to the server side application, which retrieves the data from the repository via REST API (data stays within the server(-cluster) and is not transferred to the client due to performance and security reasons). Since the data retrieval bases on CIMs, the returned dataset can be aligned with standard openEHR CIMs that express range, format, value set and cardinality constraints (cf. [5]). The client application uses these constraints to generate standardized R-Scripts as MMs. Further MMs aligned with CIM-paths can be added in the client, e.g. for multi-value constraints which cannot be expressed in openEHR CIMs. The client sends the MMs to the server application, which executes the R-scripts on corresponding data using R. Results are returned to the client for aggregation steps and visualization.

Discussion: This contribution describes the architecture of a tool for interoperable data characterization in early implementation phase. The feasibility of the described steps has been tested. A real world application is planned. Although the tool is intended for data characterization, the architecture could be applied for other often-discussed purposes, e.g. cohort identification.

Presented work partly done within project „HiGHmed“ (German MI-Initiative), funded by BMBF (grant: 01ZZ1802C).

The authors declare that an ethics committee vote is not required.


References

1.
Martin-Sanchez FJ, Aguiar-Pulido V, Lopez-Campos GH, Peek N, Sacchi L. Secondary use and analysis of big data collected for patient care. Yearbook of medical informatics. 2017 Aug;26(01):28-37.
2.
HiGHmed. [Accessed 15 July 2019]. Available from: http://highmed.org/ External link
3.
openEHR. [Accessed 2019 Apr 05]. Available from: https://www.openehr.org/about/what_is_openehr External link
4.
Tute E. Maschinelles Lernen zur optimierten zweckabhängigen Messung von Datenqualität. In: Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS), Herausgeber. 63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Osnabrück, 02.-06.09.2018. Düsseldorf: German Medical Science GMS Publishing House; 2018. DocAbstr. 116.
5.
Tute E, Wulff A, Marschollek M, Gietzelt M. Clinical Information Model Based Data Quality Checks: Theory and Example. Stud Health Technol Inform. 2019;258:80-84.