umu.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Lili, Jiang
Alternative names
Publications (9 of 9) Show all publications
Vu, X.-S., Tran N., S. & Lili, J. (2019). dpUGC: Learn Differentially Private Representation for User Generated Contents. In: Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2019: . Paper presented at 20th International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France, April 7-13, 2019.
Open this publication in new window or tab >>dpUGC: Learn Differentially Private Representation for User Generated Contents
2019 (English)In: Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2019, 2019Conference paper, Published paper (Refereed)
Abstract [en]

This paper firstly proposes a simple yet efficient generalized approach to apply differential privacy to text representation (i.e., word embedding). Based on it, we propose a user-level approach to learn personalized differentially private word embedding model on user generated contents (UGC). To our best knowledge, this is the first work of learning user-level differentially private word embedding model from text for sharing. The proposed approaches protect the privacy of the individual from re-identification, especially provide better trade-off of privacy and data utility on UGC data for sharing. The experimental results show that the trained embedding models are applicable for the classic text analysis tasks (e.g., regression). Moreover, the proposed approaches of learning differentially private embedding models are both framework- and dataindependent, which facilitates the deployment and sharing. The source code is available at https://github.com/sonvx/dpText.

National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:umu:diva-160887 (URN)
Conference
20th International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France, April 7-13, 2019
Available from: 2019-06-25 Created: 2019-06-25 Last updated: 2019-08-22
Vu, X.-S., Santra, A., Chakravarthy, S. & Lili, J. (2019). Generic Multilayer Network Data Analysis with the Fusion of Content and Structure. In: Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2019: . Paper presented at 20th International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France, April 7-13, 2019.
Open this publication in new window or tab >>Generic Multilayer Network Data Analysis with the Fusion of Content and Structure
2019 (English)In: Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2019, 2019Conference paper, Published paper (Refereed)
Abstract [en]

Multi-feature data analysis (e.g., on Facebook, LinkedIn) is challenging especially if one wants to do it efficiently and retain the flexibility by choosing features of interest for analysis. Features (e.g., age, gender, relationship, political view etc.) can be explicitly given from datasets, but also can be derived from content (e.g., political view based on Facebook posts). Analysis from multiple perspectives is needed to understand the datasets (or subsets of it) and to infer meaningful knowledge. For example, the influence of age, location, and marital status on political views may need to be inferred separately (or in combination). In this paper, we adapt multilayer network (MLN) analysis, a nontraditional approach, to model the Facebook datasets, integrate content analysis, and conduct analysis, which is driven by a list of desired application based queries. Our experimental analysis shows the flexibility and efficiency of the proposed approach when modeling and analyzing datasets with multiple features.

Keywords
Social network analysis, Multilayer networks, Content analysis
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:umu:diva-162572 (URN)
Conference
20th International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France, April 7-13, 2019
Available from: 2019-08-22 Created: 2019-08-22 Last updated: 2019-08-22
Vu, X.-S., Addi, A.-M., Elmroth, E. & Lili, J. (2019). Graph-based Interactive Data Federation System for Heterogeneous Data Retrieval and Analytics. In: Proceedings of The 30th TheWebConf'19 (formerly WWW), USA: . Paper presented at The Web Conference, San Fransisco, USA, May 13-17, 2019 (pp. 3595-3599). New York, NY, USA: ACM Digital Library
Open this publication in new window or tab >>Graph-based Interactive Data Federation System for Heterogeneous Data Retrieval and Analytics
2019 (English)In: Proceedings of The 30th TheWebConf'19 (formerly WWW), USA, New York, NY, USA: ACM Digital Library, 2019, p. 3595-3599Conference paper, Published paper (Refereed)
Abstract [en]

Given the increasing number of heterogeneous data stored in relational databases, file systems or cloud environment, it needs to be easily accessed and semantically connected for further data analytic. The potential of data federation is largely untapped, this paper presents an interactive data federation system (https://vimeo.com/ 319473546) by applying large-scale techniques including heterogeneous data federation, natural language processing, association rules and semantic web to perform data retrieval and analytics on social network data. The system first creates a Virtual Database (VDB) to virtually integrate data from multiple data sources. Next, a RDF generator is built to unify data, together with SPARQL queries, to support semantic data search over the processed text data by natural language processing (NLP). Association rule analysis is used to discover the patterns and recognize the most important co-occurrences of variables from multiple data sources. The system demonstrates how it facilitates interactive data analytic towards different application scenarios (e.g., sentiment analysis, privacyconcern analysis, community detection).

Place, publisher, year, edition, pages
New York, NY, USA: ACM Digital Library, 2019
Keywords
heterogeneous data federation, RDF, interactive data analysis
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:umu:diva-160892 (URN)10.1145/3308558.3314138 (DOI)978-1-4503-6674-8 (ISBN)
Conference
The Web Conference, San Fransisco, USA, May 13-17, 2019
Available from: 2019-06-25 Created: 2019-06-25 Last updated: 2019-08-22Bibliographically approved
Wang, A., Chen, Y., An, N., Yang, J., Li, L. & Lili, J. (2019). Microarray Missing Value Imputation: A Regularized Local Learning Method. IEEE/ACM Transactions on Computational Biology & Bioinformatics, 16(3), 980-993
Open this publication in new window or tab >>Microarray Missing Value Imputation: A Regularized Local Learning Method
Show others...
2019 (English)In: IEEE/ACM Transactions on Computational Biology & Bioinformatics, ISSN 1545-5963, E-ISSN 1557-9964, Vol. 16, no 3, p. 980-993Article in journal (Refereed) Published
Abstract [en]

Microarray experiments on gene expression inevitably generate missing values, which impedes further downstream biological analysis. Therefore, it is key to estimate the missing values accurately. Most of the existing imputation methods tend to suffer from the over-fitting problem. In this study, we propose two regularized local learning methods for microarray missing value imputation. Motivated by the grouping effect of L-2 regularization, after selecting the target gene, we train an L-2 Regularized Local Least Squares imputation model (RLLSimpute_L2) on the target gene and its neighbors to estimate the missing values of the target gene. Furthermore, RLLSimpute_L2 imputes the missing values in an ascending order based on the associated missing rate with each target gene. This contributes to fully utilizing the previously estimated values. Besides L-2, we further explore L-1 regularization and propose an L-1 Regularized Local Least Squares imputation model (RLLSimpute_L1). To evaluate their effectiveness, we conducted extensive experimental studies on six benchmark datasets covering both time series and non-time series cases. Nine state-of-the-art imputation methods are compared with RLLSimpute_L2 and RLLSimpute_L1 in terms of three performance metrics. The comparative experimental results indicate that RLLSimpute_L2 outperforms its competitors by achieving smaller imputation errors and better structure preservation of differentially expressed genes.

Place, publisher, year, edition, pages
IEEE, 2019
Keywords
Microarray data, missing value imputation, regularized model, local learning, similarity measurement
National Category
Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:umu:diva-161474 (URN)10.1109/TCBB.2018.2810205 (DOI)000471070600028 ()29994588 (PubMedID)
Available from: 2019-07-09 Created: 2019-07-09 Last updated: 2019-07-09Bibliographically approved
Vu, X.-S., Flekova, L., Lili, J. & Gurevych, I. (2018). Lexical-semantic resources: yet powerful resources for automatic personality classification. In: Francis Bond, Takayuki Kuribayashi, Christiane Fellbaum, Piek Vossen (Ed.), Proceedings of the 9th Global WordNet Conference (GWC 2018): . Paper presented at The 9th Global WordNet Conference GWC2018, Singapore, January 8-12, 2018 (pp. 173-182). Singapore: Nanyang Technological University (NTU)
Open this publication in new window or tab >>Lexical-semantic resources: yet powerful resources for automatic personality classification
2018 (English)In: Proceedings of the 9th Global WordNet Conference (GWC 2018) / [ed] Francis Bond, Takayuki Kuribayashi, Christiane Fellbaum, Piek Vossen, Singapore: Nanyang Technological University (NTU) , 2018, , p. 10p. 173-182Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we aim to reveal the impact of lexical-semantic resources, used in particular for word sense disambiguation and sense-level semantic categorization, on automatic personality classification task. While stylistic features (e.g., part-of-speech counts) have been shown their power in this task, the impact of semantics beyond targeted word lists is relatively unexplored. We propose and extract three types of lexical-semantic features, which capture high-level concepts and emotions, overcoming the lexical gap of word n-grams. Our experimental results are comparable to state-of-the-art methods, while no personality-specific resources are required.

Place, publisher, year, edition, pages
Singapore: Nanyang Technological University (NTU), 2018. p. 10
Keywords
Personality Profiling
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-144672 (URN)978-981-11-7087-4 (ISBN)
Conference
The 9th Global WordNet Conference GWC2018, Singapore, January 8-12, 2018
Projects
Privacy-aware Data Federation
Available from: 2018-02-09 Created: 2018-02-09 Last updated: 2019-08-22Bibliographically approved
Vu, X.-S. & Lili, J. (2018). Self-adaptive Privacy Concern Detection for User-generated Content. In: Proceedings of the 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2018: . Paper presented at 19th International Conference on Computational Linguistics and Intelligent Text Processing, Hanoi, Vietnam, March 18-24, 2018.
Open this publication in new window or tab >>Self-adaptive Privacy Concern Detection for User-generated Content
2018 (English)In: Proceedings of the 19th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2018, 2018Conference paper (Other academic)
Abstract [en]

To protect user privacy in data analysis, a state-of-the-art strategy is differential privacy in which scientific noise is injected into the real analysis output. The noise masks individual’s sensitive information contained in the dataset. However, determining the amount of noise is a key challenge, since too much noise will destroy data utility while too little noise will increase privacy risk. Though previous research works have designed some mechanisms to protect data privacy in different scenarios, most of the existing studies assume uniform privacy concerns for all individuals. Consequently, putting an equal amount of noise to all individuals leads to insufficient privacy protection for some users, while over-protecting others. To address this issue, we propose a self-adaptive approach for privacy concern detection based on user personality. Our experimental studies demonstrate the effectiveness to address a suitable personalized privacy protection for cold-start users (i.e., without their privacy-concern information in training data).

Series
Lecture Notes in Computer Science (LNCS),
Keywords
privacy-guaranteed data analysis, deep learning, multi-layer perceptron
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:umu:diva-146470 (URN)
Conference
19th International Conference on Computational Linguistics and Intelligent Text Processing, Hanoi, Vietnam, March 18-24, 2018
Projects
Privacy-aware Data Federation
Available from: 2018-04-10 Created: 2018-04-10 Last updated: 2019-08-22
Vu, X.-S., Jiang, L., Brändström, A. & Elmroth, E. (2017). Personality-based Knowledge Extraction for Privacy-preserving Data Analysis. In: K-CAP 2017 - Proceedings of the Knowledge Capture Conference: . Paper presented at K-CAP 2017: The 9th International Conference on Knowledge Capture, Austin, Texas, December 4-6, 2017. Austin, TX, USA: ACM Digital Library, Article ID 45.
Open this publication in new window or tab >>Personality-based Knowledge Extraction for Privacy-preserving Data Analysis
2017 (English)In: K-CAP 2017 - Proceedings of the Knowledge Capture Conference, Austin, TX, USA: ACM Digital Library, 2017, article id 45Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we present a differential privacy preserving approach, which extracts personality-based knowledge to serve privacy guarantee data analysis on personal sensitive data. Based on the approach, we further implement an end-to-end privacy guarantee system, KaPPA, to provide researchers iterative data analysis on sensitive data. The key challenge for differential privacy is determining a reasonable amount of privacy budget to balance privacy preserving and data utility. Most of the previous work applies unified privacy budget to all individual data, which leads to insufficient privacy protection for some individuals while over-protecting others. In KaPPA, the proposed personality-based privacy preserving approach automatically calculates privacy budget for each individual. Our experimental evaluations show a significant trade-off of sufficient privacy protection and data utility.

Place, publisher, year, edition, pages
Austin, TX, USA: ACM Digital Library, 2017
Keywords
Differential Privacy, Privacy-preserving Data Analysis
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-143228 (URN)10.1145/3148011.3154479 (DOI)978-1-4503-5553-7 (ISBN)
Conference
K-CAP 2017: The 9th International Conference on Knowledge Capture, Austin, Texas, December 4-6, 2017
Projects
Privacy-aware data federation
Available from: 2017-12-19 Created: 2017-12-19 Last updated: 2019-08-22Bibliographically approved
Gonzalez, R., Jiang, L., Ahmed, M., Marciel, M., Cuevas, R., Metwalley, H. & Niccolini, S. (2017). The cookie recipe: Untangling the use of cookies in the wild. In: TMA Conference 2017: Proceedings of the 1st Network Traffic Measurement and Analysis Conference. Paper presented at 2017 Network Traffic Measurement and Analysis Conference (TMA), Dublin, Ireland, June 21-23, 2017. IEEE (C 2014. Proceedings: LNCS 8783InformationSecurity 17th International Confe= nce, ISC 2014, 12-14 Oct. 2014, Hong Kong, China, P309 osh A., 2015, ACM Transactions on Economics and Computation, V3,=20 vakorn Suphannee, 2016, 2016 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP)I=)
Open this publication in new window or tab >>The cookie recipe: Untangling the use of cookies in the wild
Show others...
2017 (English)In: TMA Conference 2017: Proceedings of the 1st Network Traffic Measurement and Analysis Conference, IEEE, 2017, no C 2014. Proceedings: LNCS 8783InformationSecurity 17th International Confe= nce, ISC 2014, 12-14 Oct. 2014, Hong Kong, China, P309 osh A., 2015, ACM Transactions on Economics and Computation, V3,=20 vakorn Suphannee, 2016, 2016 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP)I=Conference paper, Published paper (Refereed)
Abstract [en]

Users online are commonly tracked using HTTP cookies when browsing on the web. To protect their privacy, users tend to use simple tools to block the activity of HTTP cookies. However, the "block all" design of tools breaks critical web services or severely limits the online advertising ecosystem. Therefore, to ease this tension, a more nuanced strategy that discerns better the intended functionality of the HTTP cookies users encounter is required. We present the first large-scale study of the use of HTTP cookies in the wild using network traces containing more than 5.6 billion HTTP requests from real users for a period of two and a half months. We first present a statistical analysis of how cookies are used. We then analyze the structure of cookies and observe that; HTTP cookies are significantly more sophisticated than the name=3Dvalue defined by the standard and assumed by researchers and developers. Based on our findings we present an algorithm that is able to extract the information included in 86% of the cookies in our dataset with an accuracy of 91.7%. Finally, we discuss the implications of our findings and provide solutions that can be used to improve the most promising privacy preserving tools.

Place, publisher, year, edition, pages
IEEE, 2017
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-146182 (URN)10.23919/TMA.2017.8002896 (DOI)000426454700001 ()978-3-901882-95-1 (ISBN)
Conference
2017 Network Traffic Measurement and Analysis Conference (TMA), Dublin, Ireland, June 21-23, 2017
Available from: 2018-04-03 Created: 2018-04-03 Last updated: 2018-06-09Bibliographically approved
Chen, Y., Wang, A., Ding, H., Que, X., Li, Y., An, N. & Jiang, L. (2016). A global learning with local preservation method for microarray data imputation. Computers in Biology and Medicine, 77, 76-89
Open this publication in new window or tab >>A global learning with local preservation method for microarray data imputation
Show others...
2016 (English)In: Computers in Biology and Medicine, ISSN 0010-4825, E-ISSN 1879-0534, Vol. 77, p. 76-89Article in journal (Refereed) Published
Abstract [en]

Microarray data suffer from missing values for various reasons, including insufficient resolution, image noise, and experimental errors. Because missing values can hinder downstream analysis steps that require complete data as input, it is crucial to be able to estimate the missing values. In this study, we propose a Global Learning with Local Preservation method (GL2P) for imputation of missing values in microarray data. GL2P consists of two components: a local similarity measurement module and a global weighted imputation module. The former uses a local structure preservation scheme to exploit as much information as possible from the observable data, and the latter is responsible for estimating the missing values of a target gene by considering all of its neighbors rather than a subset of them. Furthermore, GL2P imputes the missing values in ascending order according to the rate of missing data for each target gene to fully utilize previously estimated values. To validate the proposed method, we conducted extensive experiments on six benchmarked microarray datasets. We compared GL2P with eight state-of-the-art imputation methods in terms of four performance metrics. The experimental results indicate that GL2P outperforms its competitors in terms of imputation accuracy and better preserves the structure of differentially expressed genes. In addition, GL2P is less sensitive to the number of neighbors than other local learning-based imputation. methods.

Place, publisher, year, edition, pages
Elsevier, 2016
Keywords
Missing value imputation, Microarray data, Global learning, Local preservation, Regression model
National Category
Computer Sciences Other Medical Biotechnology Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:umu:diva-127236 (URN)10.1016/j.compbiomed.2016.08.005 (DOI)000384866000009 ()
Available from: 2016-11-14 Created: 2016-11-03 Last updated: 2018-06-09Bibliographically approved
Organisations

Search in DiVA

Show all publications