GMS | Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH) | Which properties need protection? An application to identify vulnerable patterns in health datasets

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH)

08.09. - 13.09.2024, Dresden

Article

XML version

Send article

Which properties need protection? An application to identify vulnerable patterns in health datasets

Meeting Abstract

Search Medline for

Anna Pasquier - Medical Informatics Group, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
Karen Otte - Medical Informatics Group, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
Mehmed Halilovic - Medical Informatics Group, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
Thierry Meurers - Medical Informatics Group, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
Fabian Prasser - Medical Informatics Group, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH). Dresden, 08.-13.09.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 287

doi: 10.3205/24gmds148, urn:nbn:de:0183-24gmds1481

Published:	September 6, 2024

© 2024 Pasquier et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.

Outline

Text

Introduction: When sharing medical data, anonymization is one of the key strategies to deal with privacy concerns and is often legally mandated, for example for the Center for Cancer Registry Data at the Robert-Koch-Institute and the Health Data Lab (FDZ Gesundheit) at the Federal Institute for Drugs and Medical Device. However, good anonymization strategies are highly dependent on the context in which data is shared including factors like the recipients involved as well as their potential background knowledge and technical capabilities. Only when the context has been accurately understood, can suitable anonymization strategies be implemented and the corresponding tools set up correctly.

State of the art: Several methodologies have been suggested for qualitatively and quantitatively analyzing properties of data that increase re-identification risks, likely background knowledge of anticipated adversaries as well as contextual information, for example, about security controls implemented by the recipients. However, these methods can be complex to apply, in particular if multiple aspects are to be studied in combination, for example to develop anonymization concepts.

Concept: Our aim was to develop a graphical application that supports a range of common risk assessment methods in a form in which they can be combined with each other to support integrated modelling of re-identification risks for different threat actors.

Implementation: We have developed a web-application, using PostgreSQL, Spring Boot and Angular.js, which allows users to perform a structured risk assessment to identify sensitive and potentially identifying variables within a specific data sharing context. Based on a method proposed by Malin et al., several anticipated adversaries can be modelled. The results of this analysis process can then be used to derive anonymization methods protecting data from multiple threats.

Lessons learned: While the developed application is useful for several steps in developing data anonymization processes, it currently requires specific expertise to establish risk scores associated with each variable. In future work, we aim to expand this application and encompass questionnaires that assist users in specifying properties regarding the context of data sharing to meet the growing need to provide anonymization tools to a wider audience.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.

Outline

References

1.: Haber AC, Sax U, Prasser F; NFDI4Health Consortium. Open tools for quantitative anonymization of tabular phenotype data: literature review. Brief Bioinform. 2022;23(6):bbac440. DOI: 10.1093/bib/bbac440
2.: Malin B, Loukides G, Benitez K, Clayton EW. Identifiability in biobanks: models, measures, and mitigation strategies. Hum Genet. 2011;130(3):383-92. DOI: 10.1007/s00439-011-1042-5
3.: Jakob CE, Kohlmayer F, Meurers T, Vehreschild JJ, Prasser F. Design and evaluation of a data anonymization pipeline to promote Open Science on COVID-19. Sci Data. 2020;7(1):435. DOI: 10.1038/s41597-020-00722-x

gms | German Medical Science

Article

Which properties need protection? An application to identify vulnerable patterns in health datasets

Search Medline for

Authors

Outline

Text

References