Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Empirical evaluation of synthetic data created by generative models via attribute inference attack
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.ORCID-id: 0000-0002-0368-8037
2024 (Engelska)Ingår i: Privacy and identity management: sharing in a digital world / [ed] Felix Bieker; Silvia de Conca; Nils Gruschka; Meiko Jensen; Ina Schiering, Springer, 2024, s. 282-291Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

The disclosure risk of synthetic/artificial data is still being determined. Studies show that synthetic data generation techniques generate similar data to the original data and sometimes even the exact original data. Therefore, publishing synthetic datasets can endanger the privacy of users. In our work, we study the synthetic data generated from different synthetic data generation techniques, including the most recent diffusion models. We perform a disclosure risk assessment of synthetic datasets via an attribute inference attack, in which an attacker has access to a subset of publicly available features and at least one synthesized dataset, and the aim is to infer the sensitive features unknown to the attacker. We also compute the predictive accuracy and F1 score of the random forest classifier trained on several synthetic datasets. For sensitive categorical features, we show that Attribute Inference Attack is not highly feasible or successful. In contrast, for continuous attributes, we can have an approximate inference. This holds true for the synthetic datasets derived from Diffusion models, GANs, and DPGANs, which shows that we can only have approximated Attribute Inference, not the exact Attribute Inference.

Ort, förlag, år, upplaga, sidor
Springer, 2024. s. 282-291
Serie
IFIP Advances in Information and Communication Technology (IFIPAICT), ISSN 1868-4238, E-ISSN 1868-422X ; 695
Nyckelord [en]
Attribute Inference Attack, Differentially Private Generative Adversarial Networks, Diffusion Models, Generative Adversarial Networks, Privacy
Nationell ämneskategori
Datavetenskap (datalogi) Datorsystem
Identifikatorer
URN: urn:nbn:se:umu:diva-224381DOI: 10.1007/978-3-031-57978-3_18Scopus ID: 2-s2.0-85192354341ISBN: 9783031579776 (tryckt)ISBN: 9783031579783 (digital)OAI: oai:DiVA.org:umu-224381DiVA, id: diva2:1860712
Konferens
18th IFIP WG 9.2, 9.6/11.7, 11.6/SIG 9.2.2 International Summer School on Privacy and Identity Management, Privacy and Identity 2023. Oslo, Norway, August 8–11, 2023
Forskningsfinansiär
Wallenberg AI, Autonomous Systems and Software Program (WASP)Tillgänglig från: 2024-05-24 Skapad: 2024-05-24 Senast uppdaterad: 2024-10-09Bibliografiskt granskad
Ingår i avhandling
1. Navigating data privacy and utility: a strategic perspective
Öppna denna publikation i ny flik eller fönster >>Navigating data privacy and utility: a strategic perspective
2024 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Alternativ titel[sv]
Navigera i datasekretess och verktyg : ett strategiskt perspektiv
Abstract [en]

Privacy in machine learning should not merely be viewed as an afterthought; rather, it must serve as the foundation upon which machine learning systems are designed. In this thesis, along with the centralized machine learning, we also consider the distributed environments for training machine learning models, particularly federated learning. Federated learning lets multiple clients or organizations train a machine learning model in a collaborative manner without moving their data. Each client participating to the federation shares the model parameters learnt by training a machine learning model on its data. Even though the setup of federated learning keeps the data local, there is still a risk of sensitive information leaking through the model updates. For instance, attackers could potentially use the updates of the model parameters to figure out details about the data held by clients. So, while federated learning is designed to protect privacy, it still faces challenges in ensuring that the data remains secure throughout the training process. 

Originally, federated learning was introduced in the context of deep learning models. However, this thesis focuses on federated learning for decision trees. Decision Trees are intuitive, and interpretable models, making them popular in a wide range of applications, especially where explanability of the decisions made by the decision tree model is important. However, Decision Trees are vulnerable to inference attacks, particularly when the structure of the decision tree is exposed. To mitigate these vulnerabilities, a key contribution of this thesis is the development of novel federated learning algorithms that incorporate privacy-preserving techniques, such as $k$-anonymity and differential privacy, into the construction of decision trees. By doing so, we seek to ensure user privacy without significantly compromising the performance of the model. Machine learning models learn patterns from data, and during this process, they might leak sensitive information. Each step of the machine learning pipeline presents unique vulnerabilities, making it essential to assess and quantify the privacy risks involved. One focus of this thesis is the quantification of privacy by devising a data reconstruction attack tailored to Principal Component Analysis (PCA), a widely used dimensionality reduction technique. Furthermore, various protection mechanisms are evaluated in terms of their effectiveness in preserving privacy against such reconstruction attacks while maintaining the utility of the model. In addition to federated learning, this thesis also addresses the privacy concerns associated with synthetic datasets generated by models such as generative networks. Specifically, we perform an Attribute Inference Attack on synthetic datasets, and quantify privacy by calculating the Inference Accuracy—a metric that reflects the success of the attacker in estimating sensitive attributes of target individuals.

Overall, this thesis contributes to the development of privacy-preserving algorithms for decision trees in federated learning and introduces methods to quantify privacy in machine learning systems. Also, the findings of this thesis set a ground for further research at the intersection of privacy, and machine learning.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå University, 2024. s. 103
Serie
UMINF, ISSN 0348-0542 ; 24.10
Nyckelord
Privacy, Data Reconstruction Attacks, k-anonymity, Differential Privacy, Federated Learning, Decision Trees, Principal Component Analysis
Nationell ämneskategori
Datavetenskap (datalogi) Annan teknik
Identifikatorer
urn:nbn:se:umu:diva-230616 (URN)978-91-8070-481-6 (ISBN)978-91-8070-482-3 (ISBN)
Disputation
2024-11-04, BIO.A.206 Aula Anatomica, Biologihuset, 09:15 (Engelska)
Opponent
Handledare
Forskningsfinansiär
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Tillgänglig från: 2024-10-14 Skapad: 2024-10-08 Senast uppdaterad: 2024-10-24Bibliografiskt granskad

Open Access i DiVA

fulltext(392 kB)122 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 392 kBChecksumma SHA-512
160603fc6eb97f2cf59215553dbcfec0983b64e3378b71aacb251ecef2d78594e4648f71e7d5a7904d921bf68dbca1c1d37ebf1a315fa21001348fe5c99a1db2
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextScopus

Person

Kwatra, SaloniTorra, Vicenç

Sök vidare i DiVA

Av författaren/redaktören
Kwatra, SaloniTorra, Vicenç
Av organisationen
Institutionen för datavetenskap
Datavetenskap (datalogi)Datorsystem

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 122 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 466 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf