gms | German Medical Science

64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie

08. - 11.09.2019, Dortmund

Simple Batch Record Linkage System (SimBa) – A generic Tool for Record Linkage of special categories of personal data in small networked research projects with distributed data sources: Lessons learned from the Inno_RD project

Meeting Abstract

Search Medline for

  • Hauke Fischer - University of Oldenburg, Oldenburg, Germany
  • Rainer Röhrig - Carl von Ossietzky Universität, Oldenburg, Germany
  • Volker Sebastian Thiemann - Carl von Ossietzky Universität, Oldenburg, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 64. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e.V. (GMDS). Dortmund, 08.-11.09.2019. Düsseldorf: German Medical Science GMS Publishing House; 2019. DocAbstr. 42

doi: 10.3205/19gmds118, urn:nbn:de:0183-19gmds1188

Published: September 6, 2019

© 2019 Fischer et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at



Introduction: Every year millions of emergency runs take place in Germany. There is little systematic quality assurance (QA) for emergency runs conducted, to measure and improve service quality. QA should include medical outcomes and display the course of the patient [1].

In order to evaluate the medical outcomes of these, medical data from different sources needs to be linked. In Germany, the health insurance number can be used to link the data sets from multiple health service providers. The approach allows the tracking of the patient course beyond single contacts to a specific service provider.

The recent changes in EU legislation introduced by the General Data Protection Regulation (GDPR), especially regarding sensible information such as health data, must be considered. The medical secrecy must also be observed in this context (Paragraph 203, StGB*).

State of the Art: The TMF**, the German umbrella organization for networked medical research, offers a generic data protection concept. The generic concept integrates different modules, which define domain specific data protection concepts [2]. An overview of established solutions for the German health care system is given by the ToolPool Gesundheitsforschung [3]. The ToolPool contains approaches like the Mainzelliste [4] and MOSAIC [5].

The latter approaches provide identity management in the form of federated IT infrastructures. These are often used in medical research registers with continuous data flow and the processing of a variety of personal data for data linkage.

Concept: It can be assumed that the quality of the data of the health insurance number is quite high due to its purpose in the billing process. The direct usage of the health insurance number is not possible. Therefore we cryptographically generate the pseudonym based on the health insurance number.

Using this basic concept for data linkage, the data is processed based on a two level pseudonymization approach in SimBa. With this approach, none of the three parties holds enough information to single-handedly trace the research data back to a patient or vice versa. The concept follows Pommerening et al. [2].

Implementation: On server-side of SimBa we are using Java Enterprise Edition. On the client-side of SimBa, we use a Java desktop application.

The first level of the pseudonymization process and local PID generation is based on the patient’s health insurance number, employing a pepper-based hash method with a SHA-512 algorithm for the hash process implementation.

Lessons Learned: Within the implementation process we had to do several software development iterations with different project partners. We learned how important it is to have detailed requirements engineering processes involving all stakeholders in order to be able to develop efficiently.

In Germany the use of data linkage increases within research projects. Many projects develop the necessary tools independently of each other with similar requirements [6]. Therefore, there is a need for a generic toolbox combining multiple methods to minimize the effort for the continuous new development of similar tools.

Acknowledgement: Funding by Innovationsfonds, No. 01VSF17032.

*Criminal Code in the version published on 13 November 1998 (BGBl. I p. 3322), as last amended by Article 14 of the Act of 18 December 2018 (BGBl. I p. 2639)

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


Piedmont S, Brammen D, Branse D, Focke K, Kast W, Robra BP. Auf dem Weg zur integrierten Qualitätssicherung im Rettungsdienst. Notfall Rettungsmed. 2018;4:261.
Pommerening K, Müller T. Leitfaden zum Datenschutz in medizinischen Forschungsprojekten: Generische Lösungen der TMF 2.0. Berlin: MWV; 2014.
TMF. [Accessed 20 October 2018]. Available from: External link
Lablans M, Borg A, Ückert F. A RESTful interface to pseudonymization services in modern web applications. BMC medical informatics and decision making. 2015 Dec;15(1):2.
Bialke M, Bahls T, Havemann C, Piegsa J, Weitmann K, Wegner T, Hoffmann W. MOSAIC – a modular approach to data management in epidemiological studies. Methods of Information in Medicine. 2015;54(04):364-71.
March S, Antoni M, Kieschke J, Kollhorst B, Maier B, Müller G, Sariyar M, Schulz M, Enno S, Zeidler J, Hoffmann F. Quo vadis Datenlinkage in Deutschland? Eine erste Bestandsaufnahme. Das Gesundheitswesen. 2018 Jun;57(03):e20-31.