Artikel
Unambiguously Re-Identify a Patient – a Case Study
Suche in Medline nach
Autoren
Veröffentlicht: | 15. September 2023 |
---|
Gliederung
Text
Introduction: The ear nose and throat clinic of the university hospital Münster conducts retrospective studies in the area of cancer treatment, rhinoplasty and ear surgery [1], [2]. Patient data for research purposes are stored in different study dependent databases. According to Polonetsky et al. [3], data was stored “explicitly personal”, meaning it contains direct identifiers. It contains patients’ administrative data like name and birthday, but the unique patient identifier (PID) of the clinical information system (CIS) has been removed. For centralisation, all data sets should be merged into a single new research structure. This merging requires a record linkage to combine data from same patients [4]. Objective of this work is, to re-identify patients in the CIS by using a combination of identifiers to use the PID during the record linkage process.
Methods: To achieve this goal, a combination of direct and indirect identifiers in the research database have been used [3]. The combined forename, surname and birthdate has been used to identify the patient in the CIS. This process has been automated by querying directly the PID from the CIS’s database using SQL.
Results: The result of the query is a list of PIDs that are found given the combination of forename, surname and birthdate. Ideally, you have one PID for every combination of variables. This was true for the majority of cases. From the 1081 patients, 86,9% of PIDs could be clearly reconstructed. 66 (6,1%) of the cases had multiple PIDs and 76 (7%) of the cases were not found.
Discussion: Most of the cases with multiple PIDs were later manually assigned to the right patient by checking the PIDs in the CIS. Records who couldn’t be found in the CIS were left out. The main reason for duplicates was the existence of multiple instances of the same person in the CIS. Patients, which changed their address or health insurance, were falsely administered as new patient. By applying automated data quality controls in routine care or further training of employees to make the subject more sensible, the number of duplicates can be reduced. The main reason why for some cases no PIDs could be identified was that the naming conventions in the research database and CIS differed. Examples for this are middle names, titles of nobility or academic degrees. Especially foreign patients’ middle names were treated wrongly. Applying more advanced linkage algorithms could solve this issue [4]. After Reidentification the data is stored in a joint database that is coordinated by a trusted third party.
Conclusion: By using only forename, surname and birthdate, 86,9% of PIDs could be reconstructed. Through a manual adjustment, even 99,4% were re-identified. It can be concluded, that forename, surname and birthday are enough unambiguously identify patients in the CIS. This is consistent with record linkage software using these three variables as source for linkage [5]. The main sources of error are naming conventions and duplicates in the CIS. Other Variables like address or operation dates can be used to make the results more reliable.
The authors declare that they have no competing interests.
The authors declare that an ethics committee vote is not required.
References
- 1.
- Oberste M, Riders A, Abbaspour B, Kerschke L, Beule A, Rudack C. Improvement of patient stratification in human papilloma virus-associated oropharyngeal squamous cell carcinoma by defining a multivariable risk score. Head Neck. 2021 Nov;43(11):3314-3323. DOI: 10.1002/hed.26822
- 2.
- Savvas E, Heslinga K, Spiekermann CO, Stenner M, Rudack C. Anamnesis as a Prognostic Factor in Cochlear Implantation in Adults. ORL J Otorhinolaryngol Relat Spec. 2021;83(1):14-24. DOI: 10.1159/000509562
- 3.
- Polonetsky J, Tene O, and Finch K. Shades of Gray: Seeing the Full Spectrum of Practical Data De-identification. 56 Santa Clara Law Review. 2016;56(3):593. Available from: https://digitalcommons.law.scu.edu/lawreview/vol56/iss3/3
- 4.
- Asher J, Resnick D, Brite J, Brackbill R, Cone J. An Introduction to Probabilistic Record Linkage with a Focus on Linkage Processing for WTC Registries. Int J Environ Res Public Health. 2020 Sep 22;17(18):6937. DOI: 10.3390/ijerph17186937
- 5.
- Rohde F, Franke M, Sehili Z, Lablans M, Rahm E. Optimization of the Mainzelliste software for fast privacy-preserving recordlinkage. J Transl Med. 2021 Jan 15;19(1):33. DOI: 10.1186/s12967-020-02678-1