umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Graph-based Interactive Data Federation System for Heterogeneous Data Retrieval and Analytics
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (Database and Data Mining Group)ORCID-id: 0000-0001-8820-2405
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (Database and Data Mining Group)
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (Database and Data Mining Group)
2019 (Engelska)Ingår i: Proceedings of The World Wide Web Conference WWW 2019, New York, NY, USA: ACM Digital Library, 2019, s. 3595-3599Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Given the increasing number of heterogeneous data stored in relational databases, file systems or cloud environment, it needs to be easily accessed and semantically connected for further data analytic. The potential of data federation is largely untapped, this paper presents an interactive data federation system (https://vimeo.com/ 319473546) by applying large-scale techniques including heterogeneous data federation, natural language processing, association rules and semantic web to perform data retrieval and analytics on social network data. The system first creates a Virtual Database (VDB) to virtually integrate data from multiple data sources. Next, a RDF generator is built to unify data, together with SPARQL queries, to support semantic data search over the processed text data by natural language processing (NLP). Association rule analysis is used to discover the patterns and recognize the most important co-occurrences of variables from multiple data sources. The system demonstrates how it facilitates interactive data analytic towards different application scenarios (e.g., sentiment analysis, privacyconcern analysis, community detection).

Ort, förlag, år, upplaga, sidor
New York, NY, USA: ACM Digital Library, 2019. s. 3595-3599
Nyckelord [en]
heterogeneous data federation, RDF, interactive data analysis
Nationell ämneskategori
Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
URN: urn:nbn:se:umu:diva-160892DOI: 10.1145/3308558.3314138ISI: 000483508403101Scopus ID: 2-s2.0-85066893934ISBN: 978-1-4503-6674-8 (tryckt)OAI: oai:DiVA.org:umu-160892DiVA, id: diva2:1330478
Konferens
WWW '19, The World Wide Web Conference, San Francisco, CA, USA, May 13–17, 2019
Tillgänglig från: 2019-06-25 Skapad: 2019-06-25 Senast uppdaterad: 2019-11-14Bibliografiskt granskad
Ingår i avhandling
1. Privacy-awareness in the era of Big Data and machine learning
Öppna denna publikation i ny flik eller fönster >>Privacy-awareness in the era of Big Data and machine learning
2019 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Alternativ titel[sv]
Integritetsmedvetenhet i eran av Big Data och maskininlärning
Abstract [en]

Social Network Sites (SNS) such as Facebook and Twitter, have been playing a great role in our lives. On the one hand, they help connect people who would not otherwise be connected before. Many recent breakthroughs in AI such as facial recognition [49] were achieved thanks to the amount of available data on the Internet via SNS (hereafter Big Data). On the other hand, due to privacy concerns, many people have tried to avoid SNS to protect their privacy. Similar to the security issue of the Internet protocol, Machine Learning (ML), as the core of AI, was not designed with privacy in mind. For instance, Support Vector Machines (SVMs) try to solve a quadratic optimization problem by deciding which instances of training dataset are support vectors. This means that the data of people involved in the training process will also be published within the SVM models. Thus, privacy guarantees must be applied to the worst-case outliers, and meanwhile data utilities have to be guaranteed.

For the above reasons, this thesis studies on: (1) how to construct data federation infrastructure with privacy guarantee in the big data era; (2) how to protect privacy while learning ML models with a good trade-off between data utilities and privacy. To the first point, we proposed different frameworks em- powered by privacy-aware algorithms that satisfied the definition of differential privacy, which is the state-of-the-art privacy-guarantee algorithm by definition. Regarding (2), we proposed different neural network architectures to capture the sensitivities of user data, from which, the algorithm itself decides how much it should learn from user data to protect their privacy while achieves good performance for a downstream task. The current outcomes of the thesis are: (1) privacy-guarantee data federation infrastructure for data analysis on sensitive data; (2) privacy-guarantee algorithms for data sharing; (3) privacy-concern data analysis on social network data. The research methods used in this thesis include experiments on real-life social network dataset to evaluate aspects of proposed approaches.

Insights and outcomes from this thesis can be used by both academic and industry to guarantee privacy for data analysis and data sharing in personal data. They also have the potential to facilitate relevant research in privacy-aware representation learning and related evaluation methods.

Ort, förlag, år, upplaga, sidor
Umeå: Department of computing science, Umeå University, 2019. s. 42
Serie
Report / UMINF, ISSN 0348-0542 ; 19.06
Nyckelord
Diferential Privacy, Machine Learning, Deep Learning, Big Data
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-162182 (URN)9789178551101 (ISBN)
Presentation
2019-09-09, 23:40 (Engelska)
Handledare
Tillgänglig från: 2019-08-22 Skapad: 2019-08-15 Senast uppdaterad: 2019-08-26Bibliografiskt granskad

Open Access i DiVA

fulltext(3165 kB)9 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 3165 kBChecksumma SHA-512
d60a50d54ad6531ebbd8913633a038dc34c2441c125ff85c0b0945512fb66e3bc39ff53cfe52f6e0441876db502fed7cb64a8f4c26a1934e66be8a0c2f16dc25
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Vu, Xuan-SonAddi, Ait-MloukElmroth, ErikLili, Jiang

Sök vidare i DiVA

Av författaren/redaktören
Vu, Xuan-SonAddi, Ait-MloukElmroth, ErikLili, Jiang
Av organisationen
Institutionen för datavetenskap
Språkteknologi (språkvetenskaplig databehandling)

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 9 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 117 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf