Umeå University's logo

umu.sePublikasjoner
Endre søk
Link to record
Permanent link

Direct link
Publikasjoner (10 av 33) Visa alla publikasjoner
Ming, H., Yang, J., Gui, F., Jiang, L. & An, N. (2024). Few-shot nested named entity recognition. Knowledge-Based Systems, 293, Article ID 111688.
Åpne denne publikasjonen i ny fane eller vindu >>Few-shot nested named entity recognition
Vise andre…
2024 (engelsk)Inngår i: Knowledge-Based Systems, ISSN 0950-7051, E-ISSN 1872-7409, Vol. 293, artikkel-id 111688Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

While Named Entity Recognition (NER) is a widely studied task, making inferences of entities with only a few labeled data has been challenging, especially for entities with nested structures commonly existing in NER datasets. Unlike flat entities, entities and their nested entities are more likely to have similar semantic feature representations, drastically increasing difficulties in classifying different entity categories. This paper posits that the few-shot nested NER task warrants its own dedicated attention and proposes a Global-Biaffine Positive-Enhanced (GBPE) framework for this new task. Within the GBPE framework, we first develop the new Global-Biaffine span representation to capture the span global dependency information for each entity span to distinguish nested entities. We then formulate a unique positive-enhanced contrastive loss function to enhance the utility of specific positive samples in contrastive learning for larger margins. Lastly, by using these enlarged margins, we obtain better margin constraints and incorporate them into the nearest neighbor inference to predict the unlabeled entities. Extensive experiments on three nested NER datasets in English, German, and Russian show that GBPE outperforms baseline models on the 1-shot and 5-shot tasks in terms of F1 score.

sted, utgiver, år, opplag, sider
Elsevier, 2024
Emneord
Few-shot, Nested named entity recognition, Positive-enhanced contrastive loss
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-223235 (URN)10.1016/j.knosys.2024.111688 (DOI)2-s2.0-85189309268 (Scopus ID)
Forskningsfinansiär
The Swedish Foundation for International Cooperation in Research and Higher Education (STINT), MG2020-8848
Tilgjengelig fra: 2024-04-19 Laget: 2024-04-19 Sist oppdatert: 2024-04-19bibliografisk kontrollert
Jiang, L. & Torra, V. (2023). Data protection and multi-database data-driven models. Future Internet, 15(3), Article ID 93.
Åpne denne publikasjonen i ny fane eller vindu >>Data protection and multi-database data-driven models
2023 (engelsk)Inngår i: Future Internet, E-ISSN 1999-5903, Vol. 15, nr 3, artikkel-id 93Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Anonymization and data masking have effects on data-driven models. Different anonymization methods have been developed to provide a good trade-off between privacy guarantees and data utility. Nevertheless, the effects of data protection (e.g., data microaggregation and noise addition) on data integration and on data-driven models (e.g., machine learning models) built from these data are not known. In this paper, we study how data protection affects data integration, and the corresponding effects on the results of machine learning models built from the outcome of the data integration process. The experimental results show that the levels of protection that prevent proper database integration do not affect machine learning models that learn from the integrated database to the same degree. Concretely, our preliminary analysis and experiments show that data protection techniques have a lower level of impact on data integration than on machine learning models.

sted, utgiver, år, opplag, sider
MDPI, 2023
Emneord
anonymization, data integration, data protection, masking
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-206361 (URN)10.3390/fi15030093 (DOI)000956593800001 ()2-s2.0-85150888833 (Scopus ID)
Tilgjengelig fra: 2023-04-26 Laget: 2023-04-26 Sist oppdatert: 2023-08-03bibliografisk kontrollert
Vu, X.-S., Tran, S. N. & Jiang, L. (2023). dpUGC: learn differentially private representation for user generated contents. In: Alexander Gelbukh (Ed.), Computational linguistics and intelligent text processing: 20th international conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, revised selected papers, part I. Paper presented at 20th International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France, April 7-13, 2019. (pp. 316-331). Springer, 13451
Åpne denne publikasjonen i ny fane eller vindu >>dpUGC: learn differentially private representation for user generated contents
2023 (engelsk)Inngår i: Computational linguistics and intelligent text processing: 20th international conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, revised selected papers, part I / [ed] Alexander Gelbukh, Springer, 2023, Vol. 13451, s. 316-331Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This paper firstly proposes a simple yet efficient generalized approach to apply differential privacy to text representation (i.e., word embedding). Based on it, we propose a user-level approach to learn personalized differentially private word embedding model on user generated contents (UGC). To our best knowledge, this is the first work of learning user-level differentially private word embedding model from text for sharing. The proposed approaches protect the privacy of the individual from re-identification, especially provide better trade-off of privacy and data utility on UGC data for sharing. The experimental results show that the trained embedding models are applicable for the classic text analysis tasks (e.g., regression). Moreover, the proposed approaches of learning differentially private embedding models are both framework- and dataindependent, which facilitates the deployment and sharing. The source code is available at https://github.com/sonvx/dpText.

sted, utgiver, år, opplag, sider
Springer, 2023
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13451
Emneord
Private word embedding, Differential privacy, UGC
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-160887 (URN)10.1007/978-3-031-24337-0_23 (DOI)2-s2.0-85149907226 (Scopus ID)978-3-031-24336-3 (ISBN)978-3-031-24337-0 (ISBN)
Konferanse
20th International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France, April 7-13, 2019.
Merknad

Originally included in thesis in manuscript form. 

Tilgjengelig fra: 2019-06-25 Laget: 2019-06-25 Sist oppdatert: 2023-03-28bibliografisk kontrollert
Pan, Y., Yang, J., Ming, H., Jiang, L. & An, N. (2023). Few-shot named entity recognition via Label-Attention Mechanism. In: ICCAI '23: proceedings of the 2023 9th international conference on computing and artificial intelligence. Paper presented at 9th International Conference on Computing and Artificial Intelligence, ICCAI 2023, Tianjin, China, March 17-20, 2023 (pp. 466-471). Association for Computing Machinery (ACM)
Åpne denne publikasjonen i ny fane eller vindu >>Few-shot named entity recognition via Label-Attention Mechanism
Vise andre…
2023 (engelsk)Inngår i: ICCAI '23: proceedings of the 2023 9th international conference on computing and artificial intelligence, Association for Computing Machinery (ACM), 2023, s. 466-471Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Few-shot named entity recognition aims to identify specific words with the support of very few labeled entities. Existing transfer-learning-based methods learn the semantic features of words in the source domain and migrate them to the target domain but ignore the different label-specific information. We propose a novel Label-Attention Mechanism (LAM) to utilize the overlooked label-specific information. LAM can separate label information from semantic features and learn how to obtain label information from a few samples through the meta-learning strategy. When transferring to the target domain, LAM replaces the source label information with the knowledge extracted from the target domain, thus improving the migration ability of the model. We conducted extensive experiments on multiple datasets, including OntoNotes, CoNLL'03, WNUT'17, GUM, and Few-Nerd, with two experimental settings. The results show that LAM is 7% better than the state-of-the-art baseline models by the absolute F1 scores.

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM), 2023
Serie
ACM International Conference Proceeding Series
Emneord
Few shot learning, Label-Attention, Named Entity Recognition
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-213934 (URN)10.1145/3594315.3594358 (DOI)2-s2.0-85168240049 (Scopus ID)9781450399029 (ISBN)
Konferanse
9th International Conference on Computing and Artificial Intelligence, ICCAI 2023, Tianjin, China, March 17-20, 2023
Tilgjengelig fra: 2023-09-11 Laget: 2023-09-11 Sist oppdatert: 2023-09-11bibliografisk kontrollert
Brännström, M., Jiang, L., Aler Tubella, A. & Dignum, V. (2023). Impact based fairness framework for socio-technical decision making. In: Roberta Calegari; Andrea Aler Tubella; Gabriel González Castañe; Virginia Dignum; Michela Milano (Ed.), Proceedings of the 1st workshop on fairness and bias in AIco-located with 26th european conference on artificial intelligence (ECAI 2023): . Paper presented at 1st Workshop on Fairness and Bias in AI, AEQUITAS 2023, Krakow, 1 October, 2023.. CEUR-WS
Åpne denne publikasjonen i ny fane eller vindu >>Impact based fairness framework for socio-technical decision making
2023 (engelsk)Inngår i: Proceedings of the 1st workshop on fairness and bias in AIco-located with 26th european conference on artificial intelligence (ECAI 2023) / [ed] Roberta Calegari; Andrea Aler Tubella; Gabriel González Castañe; Virginia Dignum; Michela Milano, CEUR-WS , 2023Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Avoiding bias and understanding the consequences of artificial intelligence used in decision making is of high importance to avoid mistreatment and unintended harm. This paper aims to present an impact focused approach to model the information flow of a socio-technical decision system for analysis of bias and fairness. The framework roots otherwise abstract technical accuracy and bias measures in stakeholder effects and forms a scaffold around which further analysis of the socio-technical system and its components can be coordinated. Two example use-cases are presented and analysed.

sted, utgiver, år, opplag, sider
CEUR-WS, 2023
Serie
CEUR Workshop Proceedings, ISSN 16130073 ; 3523
Emneord
decision-making system, Fairness, information-flow, socio-technical factors
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-217267 (URN)2-s2.0-85177071301 (Scopus ID)
Konferanse
1st Workshop on Fairness and Bias in AI, AEQUITAS 2023, Krakow, 1 October, 2023.
Forskningsfinansiär
EU, Horizon 2020, 101070363
Tilgjengelig fra: 2023-11-29 Laget: 2023-11-29 Sist oppdatert: 2023-11-30bibliografisk kontrollert
Vu, X.-S. & Jiang, L. (2023). Self-adaptive privacy concern detection for user-generated content. In: Alexander Gelbukh (Ed.), Computational linguistics and intelligent text processing: 19th International Conference on CiCLing 2018, Hanoi, Vietnam, March 18-24, 2018Revised selected papers, part 1. Paper presented at 19th International Conference on Computational Linguistics and Intelligent Text Processing, Hanoi, Vietnam, March 18-24, 2018. (pp. 153-167). Springer Science+Business Media B.V.
Åpne denne publikasjonen i ny fane eller vindu >>Self-adaptive privacy concern detection for user-generated content
2023 (engelsk)Inngår i: Computational linguistics and intelligent text processing: 19th International Conference on CiCLing 2018, Hanoi, Vietnam, March 18-24, 2018Revised selected papers, part 1 / [ed] Alexander Gelbukh, Springer Science+Business Media B.V., 2023, s. 153-167Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

To protect user privacy in data analysis, a state-of-the-art strategy is differential privacy in which scientific noise is injected into the real analysis output. The noise masks individual’s sensitive information contained in the dataset. However, determining the amount of noise is a key challenge, since too much noise will destroy data utility while too little noise will increase privacy risk. Though previous research works have designed some mechanisms to protect data privacy in different scenarios, most of the existing studies assume uniform privacy concerns for all individuals. Consequently, putting an equal amount of noise to all individuals leads to insufficient privacy protection for some users, while over-protecting others. To address this issue, we propose a self-adaptive approach for privacy concern detection based on user personality. Our experimental studies demonstrate the effectiveness to address a suitable personalized privacy protection for cold-start users (i.e., without their privacy-concern information in training data).

sted, utgiver, år, opplag, sider
Springer Science+Business Media B.V., 2023
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13396
Emneord
privacy-guaranteed data analysis, deep learning, multi-layer perceptron
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-146470 (URN)10.1007/978-3-031-23793-5_14 (DOI)2-s2.0-85149699287 (Scopus ID)978-3-031-23792-8 (ISBN)
Konferanse
19th International Conference on Computational Linguistics and Intelligent Text Processing, Hanoi, Vietnam, March 18-24, 2018.
Prosjekter
Privacy-aware Data Federation
Merknad

Preprint published 2018 at arXiv.org.

Tilgjengelig fra: 2018-04-10 Laget: 2018-04-10 Sist oppdatert: 2023-03-22bibliografisk kontrollert
Luan, S., Gu, Z., Saremi, A., Freidovich, L. B., Jiang, L. & Wan, S. (2023). Timing performance benchmarking of out-of-distribution detection algorithms. Journal of Signal Processing Systems, 95(12), 1355-1370
Åpne denne publikasjonen i ny fane eller vindu >>Timing performance benchmarking of out-of-distribution detection algorithms
Vise andre…
2023 (engelsk)Inngår i: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 95, nr 12, s. 1355-1370Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

In an open world with a long-tail distribution of input samples, Deep Neural Networks (DNNs) may make unpredictable mistakes for Out-of-Distribution (OOD) inputs at test time, despite high levels of accuracy obtained during model training. OOD detection can be an effective runtime assurance mechanism for safe deployment of machine learning algorithms in safety–critical applications such as medical imaging and autonomous driving. A large number of OOD detection algorithms have been proposed in recent years, with a wide range of performance metrics in terms of accuracy and execution time. For real-time safety–critical applications, e.g., autonomous driving, timing performance is of great importance in addition to accuracy. We perform a comprehensive and systematic benchmark study of multiple OOD detection algorithms in terms of both accuracy and execution time on different hardware platforms, including a powerful workstation and a resource-constrained embedded device, equipped with both CPU and GPU. We also profile and analyze the internal details of each algorithm to identify the performance bottlenecks and potential for GPU acceleration. This paper aims to provide a useful reference for the practical deployment of OOD detection algorithms for real-time safety–critical applications.

sted, utgiver, år, opplag, sider
Springer-Verlag New York, 2023
Emneord
Deep Learning, Embedded systems, Machine Learning, Out-of-Distribution detection, Real-time systems
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-206357 (URN)10.1007/s11265-023-01852-0 (DOI)000955519800001 ()2-s2.0-85150652364 (Scopus ID)
Tilgjengelig fra: 2023-04-26 Laget: 2023-04-26 Sist oppdatert: 2024-05-10bibliografisk kontrollert
Jiang, L. & Torra, V. (2022). On the Effects of Data Protection on Multi-database Data-Driven Models. In: Katsuhiro Honda; Tomoe Entani; Seiki Ubukata; Van-Nam Huynh; Masahiro Inuiguchi (Ed.), Integrated Uncertainty in Knowledge Modelling and Decision Making: 9th International Symposium, IUKM 2022, Ishikawa, Japan, March 18–19, 2022, Proceedings. Paper presented at 9th International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making, IUKM 2022 (pp. 226-238). Springer
Åpne denne publikasjonen i ny fane eller vindu >>On the Effects of Data Protection on Multi-database Data-Driven Models
2022 (engelsk)Inngår i: Integrated Uncertainty in Knowledge Modelling and Decision Making: 9th International Symposium, IUKM 2022, Ishikawa, Japan, March 18–19, 2022, Proceedings / [ed] Katsuhiro Honda; Tomoe Entani; Seiki Ubukata; Van-Nam Huynh; Masahiro Inuiguchi, Springer, 2022, s. 226-238Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This paper analyses the effects of masking mechanism for privacy preservation in data-driven models (regression) with respect to database integration. Especially two data masking methods (microaggregation and rank swapping) are applied on two public datasets to evaluate the linear regression model in terms of privacy protection and prediction performance. Our preliminary experimental results show that both methods achieve a good trade-off of privacy protection and information loss. We also show that for some experiments although data integration produces some incorrect links, the linear regression model is still comparable, with respect to prediction error, to the one inferred from the original data.

sted, utgiver, år, opplag, sider
Springer, 2022
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13199
Emneord
Data protection, Masking methods, Microaggregation, Multidatabase integration, Rank swapping, Reidentification
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-193356 (URN)10.1007/978-3-030-98018-4_19 (DOI)000786448900019 ()2-s2.0-85126526408 (Scopus ID)978-3-030-98017-7 (ISBN)978-3-030-98018-4 (ISBN)
Konferanse
9th International Symposium on Integrated Uncertainty in Knowledge Modelling and Decision Making, IUKM 2022
Merknad

Also part of the Lecture Notes in Artificial Intelligence book sub series (LNAI, volume 13199).

Tilgjengelig fra: 2022-04-01 Laget: 2022-04-01 Sist oppdatert: 2023-09-05bibliografisk kontrollert
Jiang, L., Jonsson, A. & Vanhée, L. (Eds.). (2022). Proceedings of Umeå's 25th Student Conference in Computing Science (USCCS 2022). Paper presented at Umeå's 25th Student Conference in Computing Science (USCCS 2022). Umeå: Umeå University
Åpne denne publikasjonen i ny fane eller vindu >>Proceedings of Umeå's 25th Student Conference in Computing Science (USCCS 2022)
2022 (engelsk)Konferanseproceedings (Annet vitenskapelig)
Abstract [en]

The Umeå Student Conference in Computing Science (USCCS) is organized annually as part of a course given by the Computing Science department at Umeå University. The objective of the course is to give the students a practical introduction to independent research, scientific writing, and oral presentation.

A student who participates in the course first selects a topic and a research question that they are interested in. If the topic is accepted, the student outlines a paper and composes an annotated bibliography to give a survey of the research topic. The main work consists of conducting the actual research that answers the question asked, and convincingly and clearly reporting the results in a scientific paper. Another major part of the course is multiple internal peer review meetings in which groups of students read each others' papers and give feedback to the author. This process gives valuable training in both giving and receiving criticism in a constructive manner. Altogether, the students learn to formulate and develop their own ideas in a scientific manner, in a process involving internal peer reviewing of each other's work and under supervision of the teachers, and incremental development and refinement of a scientific paper.

Each scientific paper is submitted to USCCS through an on-line submission system, and receives reviews written by members of the Computing Science department. Based on the review, the editors of the conference proceedings (the teachers of the course) issue a decision of preliminary acceptance of the paper to each author. If, after final revision, a paper is accepted, the student is given the opportunity to present the work at the conference. The review process and the conference format aims at mimicking realistic settings for publishing and participation at scientific conferences.

USCCS is the highlight of the course, and this year the conference received 10 submissions, which were carefully reviewed by the teachers of the course. As a result of the reviewing process, 6 submissions were accepted for presentation at the conference.

sted, utgiver, år, opplag, sider
Umeå: Umeå University, 2022. s. 79
Serie
Report / UMINF, ISSN 0348-0542 ; 22.01
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-191144 (URN)
Konferanse
Umeå's 25th Student Conference in Computing Science (USCCS 2022)
Tilgjengelig fra: 2022-01-10 Laget: 2022-01-10 Sist oppdatert: 2023-03-16bibliografisk kontrollert
Tran, T. K., Vu, X.-S. & Jiang, L. (2022). SoBigDemicSys: a social media based monitoring system for emerging pandemics with big data. In: Proceedings - IEEE 8th International Conference on Big Data Computing Service and Applications, BigDataService 2022: . Paper presented at Eighth IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2022, Newark, CA, USA, August 15-18, 2022 (pp. 103-107). IEEE Computer Society
Åpne denne publikasjonen i ny fane eller vindu >>SoBigDemicSys: a social media based monitoring system for emerging pandemics with big data
2022 (engelsk)Inngår i: Proceedings - IEEE 8th International Conference on Big Data Computing Service and Applications, BigDataService 2022, IEEE Computer Society, 2022, s. 103-107Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The outbreak of Covid-19 pandemic has caused millions of people infected and dead, resulting in global economy depression. Lessons learned to minimize the damage in an emerging pandemic is that timely tracking and reasonable trend prediction are required to help the society (e.g., municipality, institutions, and industries) with timely planning for efficient resource preparation and allocation. This paper presents a system to monitor the pandemic trends, analyze the correlation and impacts, predict the evolution, and visualize the prediction results to end users as social indicators. The significance lies in the fact that tracing online information collection for pandemic related prediction has less time lag, cheaper cost, and more potential information indicators.

sted, utgiver, år, opplag, sider
IEEE Computer Society, 2022
Emneord
forecast, monitoring, online big data, pandemic
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-201128 (URN)10.1109/BigDataService55688.2022.00023 (DOI)2-s2.0-85141069361 (Scopus ID)9781665458900 (ISBN)
Konferanse
Eighth IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2022, Newark, CA, USA, August 15-18, 2022
Forskningsfinansiär
The Swedish Foundation for International Cooperation in Research and Higher Education (STINT), MG2020-8848
Tilgjengelig fra: 2022-11-24 Laget: 2022-11-24 Sist oppdatert: 2022-11-24bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-7788-3986