Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Balancing act: navigating the privacy-utility spectrum in principal component analysis
Umeå University, Faculty of Science and Technology, Department of Computing Science.
Department of Computer Science, University of Pisa, Italy.
Department of Computer Science, University of Pisa, Italy.
2024 (English)In: Proceedings of the International Conference on Security and Cryptography / [ed] Sabrina De Capitani Di Vimercati; Pierangela Samarati, Science and Technology Publications, Lda , 2024, p. 850-857Conference paper, Published paper (Refereed)
Abstract [en]

A lot of research in federated learning is ongoing ever since it was proposed. Federated learning allows collaborative learning among distributed clients without sharing their raw data to a central aggregator (if it is present) or to other clients in a peer to peer architecture. However, each client participating in the federation shares their model information learned from their data with other clients participating in the FL process, or with the central aggregator. This sharing of information, however, makes this approach vulnerable to various attacks, including data reconstruction attacks. Our research specifically focuses on Principal Component Analysis (PCA), as it is a widely used dimensionality technique. For performing PCA in a federated setting, distributed clients share local eigenvectors computed from their respective data with the aggregator, which then combines and returns global eigenvectors. Previous studies on attacks against PCA have demonstrated that revealing eigenvectors can lead to membership inference and, when coupled with knowledge of data distribution, result in data reconstruction attacks. Consequently, our objective in this work is to augment privacy in eigenvectors while sustaining their utility. To obtain protected eigenvectors, we use k-anonymity, and generative networks. Through our experimentation, we did a complete privacy, and utility analysis of original and protected eigenvectors. For utility analysis, we apply HIERARCHICAL CLUSTERING, RANDOM FOREST regressor, and RANDOM FOREST classifier on the protected, and original eigenvectors. We got interesting results, when we applied HIERARCHICAL CLUSTERING on the original, and protected datasets, and eigenvectors. The height at which the clusters are merged declined from 250 to 150 for original, and synthetic version of CALIFORNIA-HOUSING data, respectively. For the k-anonymous version of CALIFORNIA-HOUSING data, the height lies between 150, and 250. To evaluate the privacy risks of the federated PCA system, we act as an attacker, and conduct a data reconstruction attack.

Place, publisher, year, edition, pages
Science and Technology Publications, Lda , 2024. p. 850-857
Series
International Conference on Security and Cryptography, ISSN 2184-7711
Keywords [en]
Data Reconstruction Attack, Federated Learning, Generative Networks, k-anonymity, Membership Inference Attack, Principal Component Analysis
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:umu:diva-229365DOI: 10.5220/0012855000003767Scopus ID: 2-s2.0-85202804524ISBN: 9789897587092 (electronic)OAI: oai:DiVA.org:umu-229365DiVA, id: diva2:1900094
Conference
21st International Conference on Security and Cryptography, SECRYPT 2024, Dijon, 8 July 2024 to 10 July 2024
Funder
EU, Horizon 2020, NFRAIA-01-2018-2019Wallenberg AI, Autonomous Systems and Software Program (WASP)Available from: 2024-09-23 Created: 2024-09-23 Last updated: 2024-10-09Bibliographically approved
In thesis
1. Navigating data privacy and utility: a strategic perspective
Open this publication in new window or tab >>Navigating data privacy and utility: a strategic perspective
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Navigera i datasekretess och verktyg : ett strategiskt perspektiv
Abstract [en]

Privacy in machine learning should not merely be viewed as an afterthought; rather, it must serve as the foundation upon which machine learning systems are designed. In this thesis, along with the centralized machine learning, we also consider the distributed environments for training machine learning models, particularly federated learning. Federated learning lets multiple clients or organizations train a machine learning model in a collaborative manner without moving their data. Each client participating to the federation shares the model parameters learnt by training a machine learning model on its data. Even though the setup of federated learning keeps the data local, there is still a risk of sensitive information leaking through the model updates. For instance, attackers could potentially use the updates of the model parameters to figure out details about the data held by clients. So, while federated learning is designed to protect privacy, it still faces challenges in ensuring that the data remains secure throughout the training process. 

Originally, federated learning was introduced in the context of deep learning models. However, this thesis focuses on federated learning for decision trees. Decision Trees are intuitive, and interpretable models, making them popular in a wide range of applications, especially where explanability of the decisions made by the decision tree model is important. However, Decision Trees are vulnerable to inference attacks, particularly when the structure of the decision tree is exposed. To mitigate these vulnerabilities, a key contribution of this thesis is the development of novel federated learning algorithms that incorporate privacy-preserving techniques, such as $k$-anonymity and differential privacy, into the construction of decision trees. By doing so, we seek to ensure user privacy without significantly compromising the performance of the model. Machine learning models learn patterns from data, and during this process, they might leak sensitive information. Each step of the machine learning pipeline presents unique vulnerabilities, making it essential to assess and quantify the privacy risks involved. One focus of this thesis is the quantification of privacy by devising a data reconstruction attack tailored to Principal Component Analysis (PCA), a widely used dimensionality reduction technique. Furthermore, various protection mechanisms are evaluated in terms of their effectiveness in preserving privacy against such reconstruction attacks while maintaining the utility of the model. In addition to federated learning, this thesis also addresses the privacy concerns associated with synthetic datasets generated by models such as generative networks. Specifically, we perform an Attribute Inference Attack on synthetic datasets, and quantify privacy by calculating the Inference Accuracy—a metric that reflects the success of the attacker in estimating sensitive attributes of target individuals.

Overall, this thesis contributes to the development of privacy-preserving algorithms for decision trees in federated learning and introduces methods to quantify privacy in machine learning systems. Also, the findings of this thesis set a ground for further research at the intersection of privacy, and machine learning.

Place, publisher, year, edition, pages
Umeå: Umeå University, 2024. p. 103
Series
UMINF, ISSN 0348-0542 ; 24.10
Keywords
Privacy, Data Reconstruction Attacks, k-anonymity, Differential Privacy, Federated Learning, Decision Trees, Principal Component Analysis
National Category
Computer Sciences Other Engineering and Technologies
Identifiers
urn:nbn:se:umu:diva-230616 (URN)978-91-8070-481-6 (ISBN)978-91-8070-482-3 (ISBN)
Public defence
2024-11-04, BIO.A.206 Aula Anatomica, Biologihuset, 09:15 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2024-10-14 Created: 2024-10-08 Last updated: 2024-10-24Bibliographically approved

Open Access in DiVA

fulltext(1151 kB)76 downloads
File information
File name FULLTEXT01.pdfFile size 1151 kBChecksum SHA-512
ec38fe0f40dc7fad24f1bc7e928d45354390f8600dedd1f0dd1ed2ae3654aa4f72da95b195d83aa0118c4aebecfe36b8eeb14476cb7bc7855b90573708500529
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Kwatra, Saloni

Search in DiVA

By author/editor
Kwatra, Saloni
By organisation
Department of Computing Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 76 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 182 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf