Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Bridging AI and privacy: solutions for high-dimensional data and foundation models
Umeå University, Faculty of Science and Technology, Department of Computing Science. (NAUSICA Research Group)ORCID iD: 0000-0002-7204-8228
2025 (English)Doctoral thesis, monograph (Other academic)Alternative title
Överbrygga AI och datasekretess : lösningar för högdimensionell data och grundmodeller (Swedish)
Abstract [en]

The widespread adoption of machine learning (ML) in various domains has enabled the extraction of meaningful insights from complex, large-scale datasets. However, recent research has revealed that ML models are vulnerable to a range of privacy attacks which can expose sensitive information about the individuals in the training data. With regulatory frameworks like the General Data Protection Regulation (GDPR) which enforces strict requirements on data sharing, the need for privacy-preserving solutions has become increasingly critical. As the world becomes more digital, massive volumes of data are generated, often in high-dimensional spaces, where the number of attributes matches or exceeds the number of samples. ML models are extensively used to process such data, making it critical to protect both the data and the models from privacy attacks. 

Traditional anonymization techniques such as k-anonymity and differential privacy often fall short when applied to high-dimensional datasets, because as dimensionality of the data increase, data-points tends to concentrate in the sparse regions of the feature space, making it difficult to find clusters of similar records. Therefore, this thesis proposes a set of privacy-preserving methodologies tailored for high-dimensional data and large-scale foundation models.

In this thesis, we begin by exploring manifold learning techniques to project high-dimensional data into a lower-dimensional latent space while preserving the intrinsic geometric structure of the original data. This transformation enhances the effectiveness of anonymization while maintaining data utility.  Building on this, we then present a novel hybrid privacy method that integrates the strengths of k-anonymity with differential privacy, enabling robust anonymization that preserves both privacy and the underlying data structure. We further investigate synthetic data generation as a privacy-preserving alternative to using sensitive data, leveraging advanced generative models such as GANs and VAEs to produce high-quality synthetic datasets. To enhance the quality of the generated data, we propose techniques that preserve the intrinsic structure of the original high-dimensional data and incorporate prior domain knowledge to guide the generation process. We rigorously evaluate the synthetic data in terms of statistical fidelity, privacy risks, ML utility, and distributional capabilities through detailed visualizations. We then address high-dimensionality and privacy concerns in the context of large-scale foundation models. We propose two model compression strategies using knowledge distillation and pruning, that effectively reduce the number of model parameters while preserving performance and enhancing the privacy of the system. 

Collectively, the thesis contributes towards building privacy-aware AI systems by developing practical solutions that address the complex interplay between high-dimensionality and privacy models. 

Place, publisher, year, edition, pages
Umeå: Umeå University, 2025. , p. 152
Series
Report / UMINF, ISSN 0348-0542 ; 25.08
Keywords [en]
Privacy, Manifold Learning, k-Anonymity, Differential Privacy, Synthetic Data Generation, Language Models, Model Compression
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:umu:diva-238299ISBN: 978-91-8070-686-5 (print)ISBN: 978-91-8070-687-2 (electronic)OAI: oai:DiVA.org:umu-238299DiVA, id: diva2:1955416
Public defence
2025-06-02, Hörsal HUM.D.210, Humanisthuset, Umeå, 09:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Available from: 2025-05-12 Created: 2025-04-30 Last updated: 2025-05-08Bibliographically approved

Open Access in DiVA

fulltext(28361 kB)486 downloads
File information
File name FULLTEXT01.pdfFile size 28361 kBChecksum SHA-512
a8dea00025563cf455f6878d3ca3524c5fea58095529132c5bee91f8e27b33bd0493de270039fd7a2715ac2dfba7bdf3460f0105af9ef5c8e114b278a1c02d74
Type fulltextMimetype application/pdf
spikblad(174 kB)62 downloads
File information
File name SPIKBLAD01.pdfFile size 174 kBChecksum SHA-512
70a51f4e6fdd2d333b0d8f8ba51d9293ae05a9bbcffb0c4a636b31b0dae72fdab8668d7ca9381146ca5fcf5036bae68604da3e9edb6ffbf4444a5d1aa453f077
Type spikbladMimetype application/pdf

Authority records

Garg, Sonakshi

Search in DiVA

By author/editor
Garg, Sonakshi
By organisation
Department of Computing Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 486 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2168 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf