Umeå University's logo

umu.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 33) Show all publications
Ming, H., Yang, J., Gui, F., Jiang, L. & An, N. (2024). Few-shot nested named entity recognition. Knowledge-Based Systems, 293, Article ID 111688.
Open this publication in new window or tab >>Few-shot nested named entity recognition
Show others...
2024 (English)In: Knowledge-Based Systems, ISSN 0950-7051, E-ISSN 1872-7409, Vol. 293, article id 111688Article in journal (Refereed) Published
Abstract [en]

While Named Entity Recognition (NER) is a widely studied task, making inferences of entities with only a few labeled data has been challenging, especially for entities with nested structures commonly existing in NER datasets. Unlike flat entities, entities and their nested entities are more likely to have similar semantic feature representations, drastically increasing difficulties in classifying different entity categories. This paper posits that the few-shot nested NER task warrants its own dedicated attention and proposes a Global-Biaffine Positive-Enhanced (GBPE) framework for this new task. Within the GBPE framework, we first develop the new Global-Biaffine span representation to capture the span global dependency information for each entity span to distinguish nested entities. We then formulate a unique positive-enhanced contrastive loss function to enhance the utility of specific positive samples in contrastive learning for larger margins. Lastly, by using these enlarged margins, we obtain better margin constraints and incorporate them into the nearest neighbor inference to predict the unlabeled entities. Extensive experiments on three nested NER datasets in English, German, and Russian show that GBPE outperforms baseline models on the 1-shot and 5-shot tasks in terms of F1 score.

Place, publisher, year, edition, pages
Elsevier, 2024
Keywords
Few-shot, Nested named entity recognition, Positive-enhanced contrastive loss
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-223235 (URN)10.1016/j.knosys.2024.111688 (DOI)2-s2.0-85189309268 (Scopus ID)
Funder
The Swedish Foundation for International Cooperation in Research and Higher Education (STINT), MG2020-8848
Available from: 2024-04-19 Created: 2024-04-19 Last updated: 2024-04-19Bibliographically approved
Jiang, L. & Torra, V. (2023). Data protection and multi-database data-driven models. Future Internet, 15(3), Article ID 93.
Open this publication in new window or tab >>Data protection and multi-database data-driven models
2023 (English)In: Future Internet, E-ISSN 1999-5903, Vol. 15, no 3, article id 93Article in journal (Refereed) Published
Abstract [en]

Anonymization and data masking have effects on data-driven models. Different anonymization methods have been developed to provide a good trade-off between privacy guarantees and data utility. Nevertheless, the effects of data protection (e.g., data microaggregation and noise addition) on data integration and on data-driven models (e.g., machine learning models) built from these data are not known. In this paper, we study how data protection affects data integration, and the corresponding effects on the results of machine learning models built from the outcome of the data integration process. The experimental results show that the levels of protection that prevent proper database integration do not affect machine learning models that learn from the integrated database to the same degree. Concretely, our preliminary analysis and experiments show that data protection techniques have a lower level of impact on data integration than on machine learning models.

Place, publisher, year, edition, pages
MDPI, 2023
Keywords
anonymization, data integration, data protection, masking
National Category
Computer Sciences Computer Systems
Identifiers
urn:nbn:se:umu:diva-206361 (URN)10.3390/fi15030093 (DOI)000956593800001 ()2-s2.0-85150888833 (Scopus ID)
Available from: 2023-04-26 Created: 2023-04-26 Last updated: 2023-08-03Bibliographically approved
Vu, X.-S., Tran, S. N. & Jiang, L. (2023). dpUGC: learn differentially private representation for user generated contents. In: Alexander Gelbukh (Ed.), Computational linguistics and intelligent text processing: 20th international conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, revised selected papers, part I. Paper presented at 20th International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France, April 7-13, 2019. (pp. 316-331). Springer, 13451
Open this publication in new window or tab >>dpUGC: learn differentially private representation for user generated contents
2023 (English)In: Computational linguistics and intelligent text processing: 20th international conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, revised selected papers, part I / [ed] Alexander Gelbukh, Springer, 2023, Vol. 13451, p. 316-331Conference paper, Published paper (Refereed)
Abstract [en]

This paper firstly proposes a simple yet efficient generalized approach to apply differential privacy to text representation (i.e., word embedding). Based on it, we propose a user-level approach to learn personalized differentially private word embedding model on user generated contents (UGC). To our best knowledge, this is the first work of learning user-level differentially private word embedding model from text for sharing. The proposed approaches protect the privacy of the individual from re-identification, especially provide better trade-off of privacy and data utility on UGC data for sharing. The experimental results show that the trained embedding models are applicable for the classic text analysis tasks (e.g., regression). Moreover, the proposed approaches of learning differentially private embedding models are both framework- and dataindependent, which facilitates the deployment and sharing. The source code is available at https://github.com/sonvx/dpText.

Place, publisher, year, edition, pages
Springer, 2023
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13451
Keywords
Private word embedding, Differential privacy, UGC
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:umu:diva-160887 (URN)10.1007/978-3-031-24337-0_23 (DOI)2-s2.0-85149907226 (Scopus ID)978-3-031-24336-3 (ISBN)978-3-031-24337-0 (ISBN)
Conference
20th International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France, April 7-13, 2019.
Note

Originally included in thesis in manuscript form. 

Available from: 2019-06-25 Created: 2019-06-25 Last updated: 2023-03-28Bibliographically approved
Pan, Y., Yang, J., Ming, H., Jiang, L. & An, N. (2023). Few-shot named entity recognition via Label-Attention Mechanism. In: ICCAI '23: proceedings of the 2023 9th international conference on computing and artificial intelligence. Paper presented at 9th International Conference on Computing and Artificial Intelligence, ICCAI 2023, Tianjin, China, March 17-20, 2023 (pp. 466-471). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Few-shot named entity recognition via Label-Attention Mechanism
Show others...
2023 (English)In: ICCAI '23: proceedings of the 2023 9th international conference on computing and artificial intelligence, Association for Computing Machinery (ACM), 2023, p. 466-471Conference paper, Published paper (Refereed)
Abstract [en]

Few-shot named entity recognition aims to identify specific words with the support of very few labeled entities. Existing transfer-learning-based methods learn the semantic features of words in the source domain and migrate them to the target domain but ignore the different label-specific information. We propose a novel Label-Attention Mechanism (LAM) to utilize the overlooked label-specific information. LAM can separate label information from semantic features and learn how to obtain label information from a few samples through the meta-learning strategy. When transferring to the target domain, LAM replaces the source label information with the knowledge extracted from the target domain, thus improving the migration ability of the model. We conducted extensive experiments on multiple datasets, including OntoNotes, CoNLL'03, WNUT'17, GUM, and Few-Nerd, with two experimental settings. The results show that LAM is 7% better than the state-of-the-art baseline models by the absolute F1 scores.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Series
ACM International Conference Proceeding Series
Keywords
Few shot learning, Label-Attention, Named Entity Recognition
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-213934 (URN)10.1145/3594315.3594358 (DOI)2-s2.0-85168240049 (Scopus ID)9781450399029 (ISBN)
Conference
9th International Conference on Computing and Artificial Intelligence, ICCAI 2023, Tianjin, China, March 17-20, 2023
Available from: 2023-09-11 Created: 2023-09-11 Last updated: 2023-09-11Bibliographically approved
Brännström, M., Jiang, L., Aler Tubella, A. & Dignum, V. (2023). Impact based fairness framework for socio-technical decision making. In: Roberta Calegari; Andrea Aler Tubella; Gabriel González Castañe; Virginia Dignum; Michela Milano (Ed.), Proceedings of the 1st workshop on fairness and bias in AIco-located with 26th european conference on artificial intelligence (ECAI 2023): . Paper presented at 1st Workshop on Fairness and Bias in AI, AEQUITAS 2023, Krakow, 1 October, 2023.. CEUR-WS
Open this publication in new window or tab >>Impact based fairness framework for socio-technical decision making
2023 (English)In: Proceedings of the 1st workshop on fairness and bias in AIco-located with 26th european conference on artificial intelligence (ECAI 2023) / [ed] Roberta Calegari; Andrea Aler Tubella; Gabriel González Castañe; Virginia Dignum; Michela Milano, CEUR-WS , 2023Conference paper, Published paper (Refereed)
Abstract [en]

Avoiding bias and understanding the consequences of artificial intelligence used in decision making is of high importance to avoid mistreatment and unintended harm. This paper aims to present an impact focused approach to model the information flow of a socio-technical decision system for analysis of bias and fairness. The framework roots otherwise abstract technical accuracy and bias measures in stakeholder effects and forms a scaffold around which further analysis of the socio-technical system and its components can be coordinated. Two example use-cases are presented and analysed.

Place, publisher, year, edition, pages
CEUR-WS, 2023
Series
CEUR Workshop Proceedings, ISSN 16130073 ; 3523
Keywords
decision-making system, Fairness, information-flow, socio-technical factors
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-217267 (URN)2-s2.0-85177071301 (Scopus ID)
Conference
1st Workshop on Fairness and Bias in AI, AEQUITAS 2023, Krakow, 1 October, 2023.
Funder
EU, Horizon 2020, 101070363
Available from: 2023-11-29 Created: 2023-11-29 Last updated: 2023-11-30Bibliographically approved
Vu, X.-S. & Jiang, L. (2023). Self-adaptive privacy concern detection for user-generated content. In: Alexander Gelbukh (Ed.), Computational linguistics and intelligent text processing: 19th International Conference on CiCLing 2018, Hanoi, Vietnam, March 18-24, 2018Revised selected papers, part 1. Paper presented at 19th International Conference on Computational Linguistics and Intelligent Text Processing, Hanoi, Vietnam, March 18-24, 2018. (pp. 153-167). Springer Science+Business Media B.V.
Open this publication in new window or tab >>Self-adaptive privacy concern detection for user-generated content
2023 (English)In: Computational linguistics and intelligent text processing: 19th International Conference on CiCLing 2018, Hanoi, Vietnam, March 18-24, 2018Revised selected papers, part 1 / [ed] Alexander Gelbukh, Springer Science+Business Media B.V., 2023, p. 153-167Conference paper, Published paper (Refereed)
Abstract [en]

To protect user privacy in data analysis, a state-of-the-art strategy is differential privacy in which scientific noise is injected into the real analysis output. The noise masks individual’s sensitive information contained in the dataset. However, determining the amount of noise is a key challenge, since too much noise will destroy data utility while too little noise will increase privacy risk. Though previous research works have designed some mechanisms to protect data privacy in different scenarios, most of the existing studies assume uniform privacy concerns for all individuals. Consequently, putting an equal amount of noise to all individuals leads to insufficient privacy protection for some users, while over-protecting others. To address this issue, we propose a self-adaptive approach for privacy concern detection based on user personality. Our experimental studies demonstrate the effectiveness to address a suitable personalized privacy protection for cold-start users (i.e., without their privacy-concern information in training data).

Place, publisher, year, edition, pages
Springer Science+Business Media B.V., 2023
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13396
Keywords
privacy-guaranteed data analysis, deep learning, multi-layer perceptron
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:umu:diva-146470 (URN)10.1007/978-3-031-23793-5_14 (DOI)2-s2.0-85149699287 (Scopus ID)978-3-031-23792-8 (ISBN)
Conference
19th International Conference on Computational Linguistics and Intelligent Text Processing, Hanoi, Vietnam, March 18-24, 2018.
Projects
Privacy-aware Data Federation
Note

Preprint published 2018 at arXiv.org.

Available from: 2018-04-10 Created: 2018-04-10 Last updated: 2023-03-22Bibliographically approved
Luan, S., Gu, Z., Saremi, A., Freidovich, L. B., Jiang, L. & Wan, S. (2023). Timing performance benchmarking of out-of-distribution detection algorithms. Journal of Signal Processing Systems, 95(12), 1355-1370
Open this publication in new window or tab >>Timing performance benchmarking of out-of-distribution detection algorithms
Show others...
2023 (English)In: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 95, no 12, p. 1355-1370Article in journal (Refereed) Published
Abstract [en]

In an open world with a long-tail distribution of input samples, Deep Neural Networks (DNNs) may make unpredictable mistakes for Out-of-Distribution (OOD) inputs at test time, despite high levels of accuracy obtained during model training. OOD detection can be an effective runtime assurance mechanism for safe deployment of machine learning algorithms in safety–critical applications such as medical imaging and autonomous driving. A large number of OOD detection algorithms have been proposed in recent years, with a wide range of performance metrics in terms of accuracy and execution time. For real-time safety–critical applications, e.g., autonomous driving, timing performance is of great importance in addition to accuracy. We perform a comprehensive and systematic benchmark study of multiple OOD detection algorithms in terms of both accuracy and execution time on different hardware platforms, including a powerful workstation and a resource-constrained embedded device, equipped with both CPU and GPU. We also profile and analyze the internal details of each algorithm to identify the performance bottlenecks and potential for GPU acceleration. This paper aims to provide a useful reference for the practical deployment of OOD detection algorithms for real-time safety–critical applications.

Place, publisher, year, edition, pages
Springer-Verlag New York, 2023
Keywords
Deep Learning, Embedded systems, Machine Learning, Out-of-Distribution detection, Real-time systems
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-206357 (URN)10.1007/s11265-023-01852-0 (DOI)000955519800001 ()2-s2.0-85150652364 (Scopus ID)
Available from: 2023-04-26 Created: 2023-04-26 Last updated: 2024-05-10Bibliographically approved
Jiang, L. & Torra, V. (2022). On the Effects of Data Protection on Multi-database Data-Driven Models. In: Katsuhiro Honda; Tomoe Entani; Seiki Ubukata; Van-Nam Huynh; Masahiro Inuiguchi (Ed.), Integrated Uncertainty in Knowledge Modelling and Decision Making: 9th International Symposium, IUKM 2022, Ishikawa, Japan, March 18–19, 2022, Proceedings. Paper presented at 9th International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making, IUKM 2022 (pp. 226-238). Springer
Open this publication in new window or tab >>On the Effects of Data Protection on Multi-database Data-Driven Models
2022 (English)In: Integrated Uncertainty in Knowledge Modelling and Decision Making: 9th International Symposium, IUKM 2022, Ishikawa, Japan, March 18–19, 2022, Proceedings / [ed] Katsuhiro Honda; Tomoe Entani; Seiki Ubukata; Van-Nam Huynh; Masahiro Inuiguchi, Springer, 2022, p. 226-238Conference paper, Published paper (Refereed)
Abstract [en]

This paper analyses the effects of masking mechanism for privacy preservation in data-driven models (regression) with respect to database integration. Especially two data masking methods (microaggregation and rank swapping) are applied on two public datasets to evaluate the linear regression model in terms of privacy protection and prediction performance. Our preliminary experimental results show that both methods achieve a good trade-off of privacy protection and information loss. We also show that for some experiments although data integration produces some incorrect links, the linear regression model is still comparable, with respect to prediction error, to the one inferred from the original data.

Place, publisher, year, edition, pages
Springer, 2022
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13199
Keywords
Data protection, Masking methods, Microaggregation, Multidatabase integration, Rank swapping, Reidentification
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-193356 (URN)10.1007/978-3-030-98018-4_19 (DOI)000786448900019 ()2-s2.0-85126526408 (Scopus ID)978-3-030-98017-7 (ISBN)978-3-030-98018-4 (ISBN)
Conference
9th International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making, IUKM 2022
Note

Also part of the Lecture Notes in Artificial Intelligence book sub series (LNAI, volume 13199).

Available from: 2022-04-01 Created: 2022-04-01 Last updated: 2023-09-05Bibliographically approved
Jiang, L., Jonsson, A. & Vanhée, L. (Eds.). (2022). Proceedings of Umeå's 25th Student Conference in Computing Science (USCCS 2022). Paper presented at Umeå's 25th Student Conference in Computing Science (USCCS 2022). Umeå: Umeå University
Open this publication in new window or tab >>Proceedings of Umeå's 25th Student Conference in Computing Science (USCCS 2022)
2022 (English)Conference proceedings (editor) (Other academic)
Abstract [en]

The Umeå Student Conference in Computing Science (USCCS) is organized annually as part of a course given by the Computing Science department at Umeå University. The objective of the course is to give the students a practical introduction to independent research, scientific writing, and oral presentation.

A student who participates in the course first selects a topic and a research question that they are interested in. If the topic is accepted, the student outlines a paper and composes an annotated bibliography to give a survey of the research topic. The main work consists of conducting the actual research that answers the question asked, and convincingly and clearly reporting the results in a scientific paper. Another major part of the course is multiple internal peer review meetings in which groups of students read each others' papers and give feedback to the author. This process gives valuable training in both giving and receiving criticism in a constructive manner. Altogether, the students learn to formulate and develop their own ideas in a scientific manner, in a process involving internal peer reviewing of each other's work and under supervision of the teachers, and incremental development and refinement of a scientific paper.

Each scientific paper is submitted to USCCS through an on-line submission system, and receives reviews written by members of the Computing Science department. Based on the review, the editors of the conference proceedings (the teachers of the course) issue a decision of preliminary acceptance of the paper to each author. If, after final revision, a paper is accepted, the student is given the opportunity to present the work at the conference. The review process and the conference format aims at mimicking realistic settings for publishing and participation at scientific conferences.

USCCS is the highlight of the course, and this year the conference received 10 submissions, which were carefully reviewed by the teachers of the course. As a result of the reviewing process, 6 submissions were accepted for presentation at the conference.

Place, publisher, year, edition, pages
Umeå: Umeå University, 2022. p. 79
Series
Report / UMINF, ISSN 0348-0542 ; 22.01
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-191144 (URN)
Conference
Umeå's 25th Student Conference in Computing Science (USCCS 2022)
Available from: 2022-01-10 Created: 2022-01-10 Last updated: 2023-03-16Bibliographically approved
Tran, T. K., Vu, X.-S. & Jiang, L. (2022). SoBigDemicSys: a social media based monitoring system for emerging pandemics with big data. In: Proceedings - IEEE 8th International Conference on Big Data Computing Service and Applications, BigDataService 2022: . Paper presented at Eighth IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2022, Newark, CA, USA, August 15-18, 2022 (pp. 103-107). IEEE Computer Society
Open this publication in new window or tab >>SoBigDemicSys: a social media based monitoring system for emerging pandemics with big data
2022 (English)In: Proceedings - IEEE 8th International Conference on Big Data Computing Service and Applications, BigDataService 2022, IEEE Computer Society, 2022, p. 103-107Conference paper, Published paper (Refereed)
Abstract [en]

The outbreak of Covid-19 pandemic has caused millions of people infected and dead, resulting in global economy depression. Lessons learned to minimize the damage in an emerging pandemic is that timely tracking and reasonable trend prediction are required to help the society (e.g., municipality, institutions, and industries) with timely planning for efficient resource preparation and allocation. This paper presents a system to monitor the pandemic trends, analyze the correlation and impacts, predict the evolution, and visualize the prediction results to end users as social indicators. The significance lies in the fact that tracing online information collection for pandemic related prediction has less time lag, cheaper cost, and more potential information indicators.

Place, publisher, year, edition, pages
IEEE Computer Society, 2022
Keywords
forecast, monitoring, online big data, pandemic
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-201128 (URN)10.1109/BigDataService55688.2022.00023 (DOI)2-s2.0-85141069361 (Scopus ID)9781665458900 (ISBN)
Conference
Eighth IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2022, Newark, CA, USA, August 15-18, 2022
Funder
The Swedish Foundation for International Cooperation in Research and Higher Education (STINT), MG2020-8848
Available from: 2022-11-24 Created: 2022-11-24 Last updated: 2022-11-24Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-7788-3986

Search in DiVA

Show all publications