gms | German Medical Science

63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

02. - 06.09.2018, Osnabrück

Reference model agnostic querying clinical information systems purely based on terminology codes – proposal of an ideal query language

Meeting Abstract

  • Georg Fette - Universitätsklinikum Würzburg, Würzburg, Deutschland
  • Mathias Kaspar - Universitätsklinikum Würzburg, Würzburg, Deutschland
  • Leon Liman - Universität Würzburg, Würzburg, Deutschland
  • Georg Dietrich - Universität Würzburg, Würzburg, Deutschland
  • Maximilian Ertl - Universitätsklinikum Würzburg, Würzburg, Deutschland
  • Jonathan Krebs - Universität Würzburg, Würzburg, Deutschland
  • Stefan Störk - Universitätsklinikum Würzburg, Würzburg, Deutschland
  • Frank Puppe - Universität Würzburg, Würzburg, Deutschland

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 63. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Osnabrück, 02.-06.09.2018. Düsseldorf: German Medical Science GMS Publishing House; 2018. DocAbstr. 14

doi: 10.3205/18gmds021, urn:nbn:de:0183-18gmds0217

Veröffentlicht: 27. August 2018

© 2018 Fette et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe



Introduction: Data exchange between clinical institutions is currently very limited, as most hospital information systems (HIS) possess proprietary data models and therefore transfer or comparison of data between different institutions with different data models is not possible. Ideally, all clinical fact data and all elements in the data’s reference models are annotated with codes from international standardized terminologies, thus facilitating the comparison of data elements amongst HIS at different institutions. When running queries for patient samples, a query should be able to be parametrized using the same terminology IDs, which are defined in the value sets of their reference models. However, in all major existing clinical query languages known to the authors (AQL [1], CQL [2], i2b2 [3], GELLO [4] and Arden Syntax [5]) attributes used in queries always have to be identified by reference model IDs. Queries are thus directly bound to the models. But even if reference model elements were specified by terminology IDs rather than reference model IDs, there are often multiple options to model data objects in local reference models.

Methods: Clinical data objects can be regarded as semantic networks in which the nodes are tagged with terminology codes. A new ideal query language (IQL) should allow querying all possible graphs, each time returning semantically equal results. As the topological structure of models can differ regarding the count of nodes and type of interrelations, the query has to be undetermined about any ontological relationships. Given the reference model of a concrete query system, a graph-matching algorithm has to find a minimum spanning tree of elements that includes all given terminology IDs. When a tree has been identified, a local query language query has to be formulated. The new query has to encompass the reference model elements contained in the spanning tree. Constraints can be added to the new query’s attributes given they are also part of the original IQL query.

Discussion: A problem when searching for the correct minimum spanning tree in a given reference model are potential ambiguities that arise because the set of given terminology IDs matches multiple, semantically different subgraphs:

Different subgraphs bearing the same semantics are a problem of the local HIS rather than the proposed method. Different structures representing semantically equal content should be merged in the HIS into harmonized data structures [6].

A second type of ambiguity is caused by an underspecified set of terminology IDs in the IQL query. It is legitimate that the user resolves this ambiguity manually by adding further discriminating terminology IDs.

The third type of ambiguity are situations, in which multiple subgraphs can only be semantically distinguished by their topology, which cannot be automatically resolved with the proposed mechanisms.

The presented IQL properties as well as the workflow how the IQL interplays with a local query engine may stimulate projects aiming to define a concrete IQL.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


OpenEHR. A semantically-enabled, vendor-independent health computing platform. [cited 2018 Feb 19]. Available from: Externer Link
HL7 Cross-Paradigm Specification. Clinical Quality Language, Release 1 Standard for Trial Use. [cited 2018 Feb 19]. Available from: Externer Link
Murphy S, Weber G, Mendis M, et al. Serving the Enterprise and beyond with Informatics for Integrating Biology and the Bedside (i2b2). J Am Med Inform Assoc. 2010 Mar-Apr;17(2):124-30.
Sordo M, Ogunyemi O, Boxwala AA, Greenes RA. GELLO: an object-oriented query and expression language for clinical decision support. AMIA Annu Symp Proc. 2003;1012.
Hripcsak G. Arden Syntax for Medical Logic Modules. MD Comput. 1991;8(2):76-8.
Bellary S, Krishnankutty B, Latha MS. Basics of case report form designing in clinical research. Perspect Clin Res. 2014;5(4):159-66.