gms | German Medical Science

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH)

08.09. - 13.09.2024, Dresden

Privacy risk assessment for synthetic longitudinal health data

Meeting Abstract

  • Julian Schneider - ZB MED – Information Centre for Life Sciences, Köln, Germany
  • Marvin Walter
  • Karen Otte - Medical Informatics Group, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
  • Thierry Meurers - Medical Informatics Group, Berlin Institute of Health at Charité – Universitätsmedizin Berlin, Berlin, Germany
  • Ines Perrar - Institute of Nutritional and Food Sciences – Nutritional Epidemiology, University of Bonn, Bonn, Germany
  • Ute Nöthlings - Rheinische Friedrich-Wilhelms-Universität Bonn, Institut für Ernährungs- und Lebensmittelwissenschaften (IEL), Abteilung Ernährungsepidemiologie, Bonn, Germany
  • Tim Adams - Fraunhofer SCAI, Sankt Augustin, Germany
  • Holger Fröhlich - Fraunhofer SCAI, Sankt Augustin, Germany
  • Fabian Prasser - Charité / BIH, Berlin, Germany
  • Juliane Fluck - ZB MED - Information Centre for Life Sciences, Bonn, Germany
  • Lisa Kühnel - Knowledge Management, ZB MED – Information Centre for Life Sciences, Köln, Germany

Gesundheit – gemeinsam. Kooperationstagung der Deutschen Gesellschaft für Medizinische Informatik, Biometrie und Epidemiologie (GMDS), Deutschen Gesellschaft für Sozialmedizin und Prävention (DGSMP), Deutschen Gesellschaft für Epidemiologie (DGEpi), Deutschen Gesellschaft für Medizinische Soziologie (DGMS) und der Deutschen Gesellschaft für Public Health (DGPH). Dresden, 08.-13.09.2024. Düsseldorf: German Medical Science GMS Publishing House; 2024. DocAbstr. 1068

doi: 10.3205/24gmds023, urn:nbn:de:0183-24gmds0231

Published: September 6, 2024

© 2024 Schneider et al.
This is an Open Access article distributed under the terms of the Creative Commons Attribution 4.0 License. See license information at http://creativecommons.org/licenses/by/4.0/.


Outline

Text

Introduction: A modern approach to ensuring privacy when sharing datasets is the use of synthetic data generation methods, which often claim to outperform classic anonymization techniques in the trade-off between data utility and privacy. Recently, it was demonstrated that various deep learning-based approaches are able to generate useful synthesized datasets, often based on domain-specific analyses. However, evaluating the privacy implications of releasing synthetic data remains a challenging problem, especially when the goal is to conform with data protection guidelines.

Methods: Therefore, the recent privacy risk quantification framework Anonymeter has been built for evaluating multiple possible vulnerabilities, which are specifically based on privacy risks that are considered by the European Data Protection Board, i.e. singling out, linkability, and attribute inference. This framework was applied to a synthetic data generation study from the epidemiological domain, where the synthesization replicates time and age trends previously found in data collected during the DONALD cohort study (1312 participants, 16 time points). The conducted privacy analyses are presented, which place a focus on the vulnerability of outliers.

Results: The resulting privacy scores are discussed, which vary greatly between the different types of attacks.

Conclusion: Challenges encountered during their implementation and during the interpretation of their results are highlighted, and it is concluded that privacy risk assessment for synthetic data remains an open problem.

The authors declare that they have no competing interests.

The authors declare that an ethics committee vote is not required.