gms | German Medical Science

68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS)

17.09. - 21.09.23, Heilbronn

Unambiguously Re-Identify a Patient – a Case Study

Meeting Abstract

  • Dennis Brosch - Institute of Medical Informatics, University of Münster, Münster, Germany
  • Tobias Brix - Institute of Medical Informatics, University of Münster, Münster, Germany
  • Simone Melnik - Institute of Medical Informatics, University of Münster, Münster, Germany
  • Achim Beule - Department of Otorhinolaryngology - Head and Neck Surgery, University Hospital of Münster, Münster, Germany
  • Armands Riders - Department of Otorhinolaryngology - Head and Neck Surgery, University Hospital of Münster, Münster, Germany
  • Claudia Rudack - Department of Otorhinolaryngology - Head and Neck Surgery, University Hospital of Münster, Münster, Germany
  • Julian Varghese - Institute of Medical Informatics, University of Münster, Münster, Germany

Deutsche Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie. 68. Jahrestagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie e. V. (GMDS). Heilbronn, 17.-21.09.2023. Düsseldorf: German Medical Science GMS Publishing House; 2023. DocAbstr. 147

doi: 10.3205/23gmds107, urn:nbn:de:0183-23gmds1077

Veröffentlicht: 15. September 2023

© 2023 Brosch et al.
Dieser Artikel ist ein Open-Access-Artikel und steht unter den Lizenzbedingungen der Creative Commons Attribution 4.0 License (Namensnennung). Lizenz-Angaben siehe http://creativecommons.org/licenses/by/4.0/.


Gliederung

Text

Introduction: The ear nose and throat clinic of the university hospital Münster conducts retrospective studies in the area of cancer treatment, rhinoplasty and ear surgery [1], [2]. Patient data for research purposes are stored in different study dependent databases. According to Polonetsky et al. [3], data was stored “explicitly personal”, meaning it contains direct identifiers. It contains patients’ administrative data like name and birthday, but the unique patient identifier (PID) of the clinical information system (CIS) has been removed. For centralisation, all data sets should be merged into a single new research structure. This merging requires a record linkage to combine data from same patients [4]. Objective of this work is, to re-identify patients in the CIS by using a combination of identifiers to use the PID during the record linkage process.

Methods: To achieve this goal, a combination of direct and indirect identifiers in the research database have been used [3]. The combined forename, surname and birthdate has been used to identify the patient in the CIS. This process has been automated by querying directly the PID from the CIS’s database using SQL.

Results: The result of the query is a list of PIDs that are found given the combination of forename, surname and birthdate. Ideally, you have one PID for every combination of variables. This was true for the majority of cases. From the 1081 patients, 86,9% of PIDs could be clearly reconstructed. 66 (6,1%) of the cases had multiple PIDs and 76 (7%) of the cases were not found.

Discussion: Most of the cases with multiple PIDs were later manually assigned to the right patient by checking the PIDs in the CIS. Records who couldn’t be found in the CIS were left out. The main reason for duplicates was the existence of multiple instances of the same person in the CIS. Patients, which changed their address or health insurance, were falsely administered as new patient. By applying automated data quality controls in routine care or further training of employees to make the subject more sensible, the number of duplicates can be reduced. The main reason why for some cases no PIDs could be identified was that the naming conventions in the research database and CIS differed. Examples for this are middle names, titles of nobility or academic degrees. Especially foreign patients’ middle names were treated wrongly. Applying more advanced linkage algorithms could solve this issue [4]. After Reidentification the data is stored in a joint database that is coordinated by a trusted third party.

Conclusion: By using only forename, surname and birthdate, 86,9% of PIDs could be reconstructed. Through a manual adjustment, even 99,4% were re-identified. It can be concluded, that forename, surname and birthday are enough unambiguously identify patients in the CIS. This is consistent with record linkage software using these three variables as source for linkage [5]. The main sources of error are naming conventions and duplicates in the CIS. Other Variables like address or operation dates can be used to make the results more reliable.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.


References

1.
Oberste M, Riders A, Abbaspour B, Kerschke L, Beule A, Rudack C. Improvement of patient stratification in human papilloma virus-associated oropharyngeal squamous cell carcinoma by defining a multivariable risk score. Head Neck. 2021 Nov;43(11):3314-3323. DOI: 10.1002/hed.26822 Externer Link
2.
Savvas E, Heslinga K, Spiekermann CO, Stenner M, Rudack C. Anamnesis as a Prognostic Factor in Cochlear Implantation in Adults. ORL J Otorhinolaryngol Relat Spec. 2021;83(1):14-24. DOI: 10.1159/000509562 Externer Link
3.
Polonetsky J, Tene O, and Finch K. Shades of Gray: Seeing the Full Spectrum of Practical Data De-identification. 56 Santa Clara Law Review. 2016;56(3):593. Available from: https://digitalcommons.law.scu.edu/lawreview/vol56/iss3/3 Externer Link
4.
Asher J, Resnick D, Brite J, Brackbill R, Cone J. An Introduction to Probabilistic Record Linkage with a Focus on Linkage Processing for WTC Registries. Int J Environ Res Public Health. 2020 Sep 22;17(18):6937. DOI: 10.3390/ijerph17186937 Externer Link
5.
Rohde F, Franke M, Sehili Z, Lablans M, Rahm E. Optimization of the Mainzelliste software for fast privacy-preserving recordlinkage. J Transl Med. 2021 Jan 15;19(1):33. DOI: 10.1186/s12967-020-02678-1 Externer Link