gms | German Medical Science

66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e. V. (TMF)

26. - 30.09.2021, online

Deduplication and Linkage of Clinical Trials in Germany

Meeting Abstract

Suche in Medline nach

  • Christian Thiele - Fachhochschule Bielefeld, Bielefeld, Germany
  • Gerrit Hirschfeld - Fachhochschule Bielefeld, Bielefeld, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 66. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS), 12. Jahreskongress der Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). sine loco [digital], 26.-30.09.2021. Düsseldorf: German Medical Science GMS Publishing House; 2021. DocAbstr. 140

doi: 10.3205/21gmds014, urn:nbn:de:0183-21gmds0147

Veröffentlicht: 24. September 2021

© 2021 Thiele et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe



Introduction: Clinical trial registries are an important source of information for physicians, researchers, and patients. However, the entries of clinical trials are scattered across many trial registries. There exist meta-registries, such as the ICTRP, but it is unclear how large the overlap between the available databases is and if they contain internal duplicates, making it difficult to get an overview of a specific countries' clinical research activities, for example.

Methods: We downloaded all trial data from (via the Aggregate Analysis of Clinical Trials), the German Clinical Trials Register, and the meta-register International Clinical Trials Registry Platform for trials with a recruiting location in Germany to link these databases and estimate the overlap between the primary and meta-registries. To achieve a robust linkage and to check the reliability of the primary and secondary IDs, which are contained in the registries, we employed a Random Forest Model to search for links that could not be made using primary IDs. Predictor variables were, among others, similarities between trial titles, start dates, sample sizes, and secondary ID matches.

Results: The vast majority of over 32,000 trials could be linked using primary IDs, because the ICTRP uses the primary IDs of its partner registries. Most of them were registered on multiple registries. Only a small number of additional links was made using the Random Forest model, but over 10,000 internal duplicates, some with multiple internal matches, were found using that model. We found 35,912 trials conducted in Germany. 28% of all trials were registered in the German DRKS. However, regarding trials that started in 2020 with a recruiting location in Germany, the DRKS was the most frequently used register.

Discussion: The Random Forest model is useful in the case of many duplicates, here when identifying internal duplicates on the ICTRP. When linking and DRKS, the secondary IDs were already quite reliable.

Conclusion: Individual registries by themselves provide an incomplete picture of clinical trials in Germany, as overall only a minority of trials is registered on the DRKS, although this has changed lately. If the goal is to get a comprehensive overview by linking multiple databases, this is still challenging.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


Banno M, Tsujimoto Y, Kataoka Y. Studies registered in accounted for an increasing proportion of protocol registrations in medical research. Journal of Clinical Epidemiology. 2019;116:106–13.
van Valkenhoef G, Loane RF, Zarin DA. Previously unidentified duplicate registrations of clinical trials: an exploratory analysis of registry data worldwide. Syst Rev. 2016;5(1):116.
Wieschowski S, Riedel N, Wollmann K, Kahrass H, Müller-Ohlraun S, Schürmann C, et al. Result dissemination from clinical trials conducted at German university medical centers was delayed and incomplete. Journal of Clinical Epidemiology. 2019;115:37–45.