gms | German Medical Science

66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

26. - 30.09.2021, online

The FAIR Record Linkage Challenge in NFDI4Health

Meeting Abstract

  • Toralf Kirsten - LIFE Research Centre for Civilization Diseases, University of Leipzig, Leipzig, Germany
  • Adrian Richter - Institut für Community Medicine, Universitätsmedizin Greifswald, Greifswald, Germany
  • Carsten Oliver Schmidt - Universität Greifswald, Greifswald, Germany
  • Johannes Drepper - TMF e.V., Berlin, Germany
  • Alessandra Simone Kuntz - Universitätsmedizin Göttingen, Medizinische Informatik, Göttingen, Germany
  • Harald Kusch - Universitätsmedizin Göttingen, Georg-August-Universität, Institut für Medizinische Informatik, Göttingen, Germany
  • Timm Intemann - Leibniz Institute for Prevention Research and Epidemiology (BIPS), University of Bremen, Bremen, Germany
  • Sebastian Claudius Semler - TMF e.V., Berlin, Germany
  • Wolfgang Ahrens - Leibniz Institute for Prevention Research and Epidemiology (BIPS), University of Bremen, Bremen, Germany
  • Ulrich Sax - Institut für Medizinische Informatik, Universitätsmedizin Göttingen, Göttingen, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 26.-30.09.2021. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 166

doi: 10.3205/21gmds005, urn:nbn:de:0183-21gmds0053

Veröffentlicht: 24. September 2021

© 2021 Kirsten et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe



Introduction: Linking different data sets referring to the same observational unit is imperative in many research endeavors. Current COVID-19 projects require beyond standardization of the collected data, enrichment of single databases via record linkage (RL) with other data sources. For example, adding claims data to studies of COVID-19 patients would ease the identification and role of concomitant chronic diseases for severe courses of COVID-19 and its treatment. Additional scientific value would also result from adding prescription data, imaging and sequencing data, sociodemographic and psychosocial data, which would help to compile useful research data corpora in many COVID-projects. Several barriers commonly aggravate such endeavors such as insufficient informed consent [1], the re-identification risks especially when using administrative data, overly restrictive interpretations of data protection regulations and technical hurdles for the conduct of RL.

Methods: In this project, we reviewed the applicability of the good practice for data linkage (GPD) [2] in COVID-19 studies and the German Medical Informatics Initiative (MII). Representatives and coordinators of observational cohort studies were interviewed regarding their use and implementation of RL. The GPD provides guidance for epidemiological projects to conduct RL. Moreover, we studied which linking techniques were applied in these studies regarding the different pre-requisites for RL.

Results: The GPD focuses recommends on methods that usinge direct identifiers to ensure correct RL. However, in most studies of the targeted field common identifiers are absent. It is therefore necessary to rely on RL involving Trusted Third Parties to manage a linkage via person-identifying data. Studies differ considerably in their implementation of RL. Some studies use custom pseudonyms to which third parties capture or associate data about study participants while person-identifying data restoredare kept under lock and key. Other studies try to match data of third parties and, thus, need to deal with heterogeneous identifiers. This is also true for federated patient data available at all university hospital data integration centers involved in the MII [3], [4].

With the absence of a national unique identifier dedicated to medical research, we are dependent on other linking approaches [5]. Several prototypes implement error-permissive, e.g. probabilistic linking techniques, necessitating nearly identical attribute sets of patients. However, some short-shot studies, mostly in the beginning of the COVID-19 pandemic, do not use such descriptions and, thus, data cannot be linked to ongoing longitudinal studies.

Discussion: Currently the absence of a national unique identifier and the absence of a broad consent for RL are the largest obstacles for successful RL in COVID-19 research and beyond. Therefore, studies have to rely on more complex and error-tolerant linking techniques. While the GPD provides a template for privacy preserving data linkage, there is still a need for recommendations as to the necessary identification data that should be captured in medical studies. Moreover, different trust services are necessary to manage person-identifying data needed for medical studies and trials, the MII, and epidemiological projects.

Conclusion: In this work, we reviewed the current practices and methodological options to link partitioned personal health data. There is an urgent need to improve the basis for successful and swift data linkages to generate the empirical evidence needed to deal with a pandemic crisis like COVID-19 in a more effective and much faster way.

Funding: This project was funded by DeutscheForschungsgemeinschaft(DFG) NFDI4health task force COVID-19 project number 451265285 and NFDI4health project number 442326535.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


da Silva MEM, Coeli CM, Ventura M, Palacios M, Magnanini MMF, Camargo TMCR, et al. Informed consent for record linkage: a systematic review. J Med Ethics. 2012 Oct;38(10):639–42.
March S, Andrich S, Drepper J, Horenkamp-Sonntag D, Icks A, Ihle P, et al. Good Practice Data Linkage (GPD): A Translation of the German Version. Int J Environ Res Public Health. 2020 Oct 27;17(21):7852.
Semler SC, Wissing F, Heyder R. German Medical Informatics Initiative. Methods Inf Med. 2018 Jul;57(S 1):e50–6.
Data Integration Centers [Internet]. [last visited: 2021-05-07]. Available from: Externer Link
Christen P, Ranbaduge T, Schnell R. Linking sensitive data. Springer; 2020.