Umeå University's logo

umu.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 42) Show all publications
Li, Y., Yang, J., Jiang, L., Liu, S. & An, N. (2026). How to quickly select good in-context examples in large language models for data-to-text tasks?. Natural Language Processing, 32(1), 1-35
Open this publication in new window or tab >>How to quickly select good in-context examples in large language models for data-to-text tasks?
Show others...
2026 (English)In: Natural Language Processing, ISSN 2977-0424, Vol. 32, no 1, p. 1-35Article in journal (Refereed) Published
Abstract [en]

In the realm of data-to-text generation tasks, the use of large language models (LLMs) has become common practice, yielding fluent and coherent outputs. Existing literature highlights that the quality of in-context examples significantly influences the empirical performance of these models, making the efficient selection of high-quality examples crucial. We hypothesize that the quality of these examples is primarily determined by two properties: their similarity to the input data and their diversity from one another. Based on this insight, we introduce a novel approach, Double Clustering-based In-Context Example Selection, specifically designed for data-to-text generation tasks. Our method involves two distinct clustering stages. The first stage aims to maximize the similarity between the in-context examples and the input data. The second stage ensures diversity among the selected in-context examples. Additionally, we have developed a batched generation method to enhance the token usage efficiency of LLMs. Experimental results demonstrate that, compared to traditional methods of selecting in-context learning samples, our approach significantly improves both time efficiency and token utilization while maintaining accuracy.

Place, publisher, year, edition, pages
Cambridge University Press, 2026
Keywords
in-context learning, data-to-text, large language models, double clustering, batched generation
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-247384 (URN)10.1017/nlp.2025.10010 (DOI)001592473500001 ()2-s2.0-105019191703 (Scopus ID)
Available from: 2025-12-09 Created: 2025-12-09 Last updated: 2026-02-11Bibliographically approved
Billis, A., Hering, A., Jiang, L., Meyer, C., Adams, L., Cuocolo, R., . . . Bressem, K. (2025). COMputational Models FOR patienT stratification in urologic cancers: creating robust and trustworthy multimodal AI for health care. In: A. Rodriguez-Gonzalez; R. Sicilia; L. Prieto-Santamaria; G.A. Papadopoulos; V. Guarrasi; M.T. Cazzolato; B. Kane (Ed.), 2025 IEEE 38th International Symposium on Computer-Based Medical Systems (CBMS): . Paper presented at 38th International Symposium on Computer Based Medical Systems-CBMS-Annual, JUN 18-20, 2025, Madrid, SPAIN (pp. 121-122). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>COMputational Models FOR patienT stratification in urologic cancers: creating robust and trustworthy multimodal AI for health care
Show others...
2025 (English)In: 2025 IEEE 38th International Symposium on Computer-Based Medical Systems (CBMS) / [ed] A. Rodriguez-Gonzalez; R. Sicilia; L. Prieto-Santamaria; G.A. Papadopoulos; V. Guarrasi; M.T. Cazzolato; B. Kane, Institute of Electrical and Electronics Engineers (IEEE), 2025, p. 121-122Conference paper, Published paper (Refereed)
Abstract [en]

Current clinical approaches fail to fully utilize unstructured data in managing prostate cancer (PCa) and kidney cancer (KC), leading to inefficiencies in patient care and increased costs. Effective diagnostics and treatments depend on integrating multimodal data, yet progress is hampered by limited data accessibility and a lack of collaborative validation between clinicians and computer scientists. To address these challenges, the EU-funded COMFORT project aims to develop commercially viable, data-driven multimodal decision support systems. These systems will improve clinical prognostication, patient stratification, and personalized treatment while also assessing the trust that healthcare professionals and patients place in AI-driven tools.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Series
IEEE International Symposium on Computer-Based Medical Systems, ISSN 2372-9198
Keywords
Multi-Modal, Artificial Intelligence, Patient Stratification, Kidney Cancer, Prostate Cancer
National Category
Cancer and Oncology
Identifiers
urn:nbn:se:umu:diva-247135 (URN)10.1109/CBMS65348.2025.00034 (DOI)001544273800024 ()2-s2.0-105010636684 (Scopus ID)9798331526115 (ISBN)9798331526108 (ISBN)
Conference
38th International Symposium on Computer Based Medical Systems-CBMS-Annual, JUN 18-20, 2025, Madrid, SPAIN
Available from: 2025-12-02 Created: 2025-12-02 Last updated: 2025-12-02Bibliographically approved
Ming, H., Yang, J., Liu, S., Jiang, L. & An, N. (2025). Harnessing high-quality pseudo-labels for robust few-shot nested named entity recognition. Engineering applications of artificial intelligence, 156, Article ID 110992.
Open this publication in new window or tab >>Harnessing high-quality pseudo-labels for robust few-shot nested named entity recognition
Show others...
2025 (English)In: Engineering applications of artificial intelligence, ISSN 0952-1976, E-ISSN 1873-6769, Vol. 156, article id 110992Article in journal (Refereed) Published
Abstract [en]

Few-shot Named Entity Recognition (NER) methods have shown initial effectiveness in flat NER tasks. However, these methods often prioritize optimizing models with a small annotated support set, neglecting the high-quality data within the unlabeled query set. Furthermore, existing few-shot NER models struggle with nested entity challenges due to linguistic or structural complexities. In this study, we introduce Retrieving high-quality pseudo-label Tuning, RiTNER, a framework designed to address few-shot nested named entity recognition tasks by leveraging high-quality data from the query set. RiTNER comprises two main components: (1) contrastive span classification, which clusters entities into corresponding prototypes and generates high-quality pseudo-labels from the unlabeled data, and (2) masked pseudo-data tuning, which generates a masked pseudo dataset and then uses it to optimize the model and enhance span classification. We train RiTNER on an English dataset and evaluate it on both English nested datasets and cross-lingual nested datasets. The results show that RiTNER outperforms the top-performing baseline models by 1.67%, and 3.04% in the English 5-shot task, as well as the cross-lingual 5-shot tasks, respectively.

Place, publisher, year, edition, pages
Elsevier, 2025
Keywords
Cross-lingual, Few-shot, High-quality pseudo-labels, Nested named entity recognition
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-239197 (URN)10.1016/j.engappai.2025.110992 (DOI)001498544200001 ()2-s2.0-105005498894 (Scopus ID)
Funder
The Swedish Foundation for International Cooperation in Research and Higher Education (STINT), MG2020-8848
Available from: 2025-06-05 Created: 2025-06-05 Last updated: 2025-06-05Bibliographically approved
Liu, N., Tang, Y., Yuan, H., Lv, H., Jiang, L., Li, Z., . . . Wang, J. (2025). Incomplete multi-view drug recommendation via multi-level representation learning and curriculum learning. In: Luiza Antonie; Jian Pei; Xiaohui Yu (Ed.), Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining: . Paper presented at 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025, Toronto, Canada, 3-7 August, 2025. (pp. 4647-4658). ACM Digital Library
Open this publication in new window or tab >>Incomplete multi-view drug recommendation via multi-level representation learning and curriculum learning
Show others...
2025 (English)In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining / [ed] Luiza Antonie; Jian Pei; Xiaohui Yu, ACM Digital Library, 2025, p. 4647-4658Conference paper, Published paper (Refereed)
Abstract [en]

The drug recommendation task aims to provide effective and safe prescription decision support for clinical treatment based on patients' past Electronic Health Records (EHR). However, the prevalent phenomenon of missing views in multi-source heterogeneous EHR data may cause performance degradation. This is due to the lack of sufficient information and increased learning difficulties, which limit the practical effectiveness of drug recommendation models in medical applications. In this paper, we emphasize the problems of incompleteness in practical drug recommendation and propose the Incomplete Multi-View Drug Recommendation model via Multi-Level Representation Learning and Curriculum Learning named IMDR. In particular, IMDR employs a Multi-Level Representation Learning architecture equipped with a Medical Code-Level Drug Knowledge Infusion Module and a Visit-Level Cross-View Information Module for patient representation learning to overcome the information loss caused by incomplete data. And then, a Gaussian-guided curriculum learning strategy is proposed to assist the learning process of IMDR with a novel difficulty measure to achieve effective progressive learning under missing medical views. Systematic evaluation on two large-scale real-world medical datasets, MIMIC-III and MIMIC-IV, demonstrates that IMDR reduces the Drug-Drug Interaction (DDI) rate by 2.97% compared to existing state-of-the-art drug recommendation baselines, while achieving significant improvements of 3.29% and 1.97% in Jaccard similarity scores and F1 score, respectively. Furthermore, compared to advanced incomplete multi-view learning (IML) models, IMDR's advantages in Jaccard similarity scores and F1 score further expand to 4.03% and 2.41%.

Place, publisher, year, edition, pages
ACM Digital Library, 2025
Keywords
drug recommendation, healthcare, incomplete multi-view learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-243883 (URN)10.1145/3711896.3737236 (DOI)2-s2.0-105014323344 (Scopus ID)9798400714542 (ISBN)
Conference
31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025, Toronto, Canada, 3-7 August, 2025.
Available from: 2025-09-05 Created: 2025-09-05 Last updated: 2025-09-05Bibliographically approved
Ming, H., Yang, J., Liu, S., Jiang, L. & An, N. (2025). Mitigating prototype shift: few-shot nested named entity recognition with prototype-attention contrastive learning. Expert systems with applications, 268, Article ID 126293.
Open this publication in new window or tab >>Mitigating prototype shift: few-shot nested named entity recognition with prototype-attention contrastive learning
Show others...
2025 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 268, article id 126293Article in journal (Refereed) Published
Abstract [en]

Nested entities are prone to obtain similar representations in pre-trained language models, posing challenges for Named Entity Recognition (NER), especially in the few-shot setting where prototype shifts often occur due to distribution differences between the support and query sets. In this paper, we regard entity representation as the combination of prototype and non-prototype representations. With a hypothesis that using the prototype representation specifically can help mitigate potential prototype shifts, we propose a Prototype-Attention mechanism in the Contrastive Learning framework (PACL) for the few-shot nested NER. PACL first generates prototype-enhanced span representations to mitigate the prototype shift by applying a prototype attention mechanism. It then adopts a novel prototype-span contrastive loss to reduce prototype differences further and overcome the O-type's non-unique prototype limitation by comparing prototype-enhanced span representations with prototypes and original semantic representations. Our experiments show that the PACL outperformed baseline models on the 1-shot and 5-shot tasks in terms of F1 score. Further analyses indicate that our Prototype-Attention mechanism is a simple but effective method and exhibits good generalizability.

Place, publisher, year, edition, pages
Elsevier, 2025
Keywords
Few-shot, Nested named entity recognition, Prototype shift
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-234004 (URN)10.1016/j.eswa.2024.126293 (DOI)001410503300001 ()2-s2.0-85214193130 (Scopus ID)
Funder
The Swedish Foundation for International Cooperation in Research and Higher Education (STINT), MG2020-8848
Available from: 2025-01-15 Created: 2025-01-15 Last updated: 2025-04-24Bibliographically approved
Ming, H., Yang, J., Liu, S., Jiang, L. & An, N. (2025). Synner: synergizing large and small language models for few-shot nested NER. In: Proceedings of the International Joint Conference on Neural Networks: . Paper presented at 2025 International Joint Conference on Neural Networks, IJCNN 2025, Rome, Italy, 30 June - 5 July.. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Synner: synergizing large and small language models for few-shot nested NER
Show others...
2025 (English)In: Proceedings of the International Joint Conference on Neural Networks, Institute of Electrical and Electronics Engineers (IEEE), 2025Conference paper, Published paper (Refereed)
Abstract [en]

Large language models (LLMs) encounter challenges when addressing few-shot nested named entity recognition (NER) tasks. Traditional LLM-based approaches typically either prompt the model to generate entity words or types in sentences based on entity categories or word spans, or directly extract all entities of specific types present in the sentences. These methods often suffer from issues such as low query efficiency or suboptimal accuracy. This paper introduces an innovative framework, SynNER, which synergizes small and large language models to address these limitations. Initially, a small language model identifies low-confidence word spans, which are then refined and refined by a large language model. To simultaneously ensure recognition accuracy and improve the query efficiency of the LLM, we propose a Batch-Prompt strategy and an Entity Indexing method. These techniques enable the LLM to process multiple test instances simultaneously while maintaining precise correction results. Experimental results demonstrate that our method achieves significant performance gains on benchmark datasets, offering a cost-effective solution for few-shot nested NER tasks.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Series
International Joint Conference on Neural Networks, ISSN 2161-4393, E-ISSN 2161-4407
Keywords
Few-Shot learning, Large language models, Nested named entity recognition
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-247606 (URN)10.1109/IJCNN64981.2025.11227406 (DOI)2-s2.0-105023965016 (Scopus ID)9798331510428 (ISBN)979-8-3315-1043-5 (ISBN)
Conference
2025 International Joint Conference on Neural Networks, IJCNN 2025, Rome, Italy, 30 June - 5 July.
Funder
The Swedish Foundation for International Cooperation in Research and Higher Education (STINT), MG2020-8848
Available from: 2025-12-22 Created: 2025-12-22 Last updated: 2025-12-22Bibliographically approved
Wang, D., Jiang, L., Kjellander, M., Weidemann, E., Trygg, J. & Tysklind, M. (2024). A novel data mining framework to investigate causes of boiler failures in waste-to-energy plants. Processes, 12(7), Article ID 1346.
Open this publication in new window or tab >>A novel data mining framework to investigate causes of boiler failures in waste-to-energy plants
Show others...
2024 (English)In: Processes, E-ISSN 2227-9717, Vol. 12, no 7, article id 1346Article in journal (Refereed) Published
Abstract [en]

Examining boiler failure causes is crucial for thermal power plant safety and profitability. However, traditional approaches are complex and expensive, lacking precise operational insights. Although data-driven approaches hold substantial potential in addressing these challenges, there is a gap in systematic approaches for investigating failure root causes with unlabeled data. Therefore, we proffered a novel framework rooted in data mining methodologies to probe the accountable operational variables for boiler failures. The primary objective was to furnish precise guidance for future operations to proactively prevent similar failures. The framework was centered on two data mining approaches, Principal Component Analysis (PCA) + K-means and Deep Embedded Clustering (DEC), with PCA + K-means serving as the baseline against which the performance of DEC was evaluated. To demonstrate the framework’s specifics, a case study was performed using datasets obtained from a waste-to-energy plant in Sweden. The results showed the following: (1) The clustering outcomes of DEC consistently surpass those of PCA + K-means across nearly every dimension. (2) The operational temperature variables T-BSH3rm, T-BSH2l, T-BSH3r, T-BSH1l, T-SbSH3, and T-BSH1r emerged as the most significant contributors to the failures. It is advisable to maintain the operational levels of T-BSH3rm, T-BSH2l, T-BSH3r, T-BSH1l, T-SbSH3, and T-BSH1r around 527 °C, 432 °C, 482 °C, 338 °C, 313 °C, and 343 °C respectively. Moreover, it is crucial to prevent these values from reaching or exceeding 594 °C, 471 °C, 537 °C, 355 °C, 340 °C, and 359 °C for prolonged durations. The findings offer the opportunity to improve future operational conditions, thereby extending the overall service life of the boiler. Consequently, operators can address faulty tubes during scheduled annual maintenance without encountering failures and disrupting production.

Place, publisher, year, edition, pages
MDPI, 2024
Keywords
data mining, deep embedded clustering, failure analysis, power plants
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-228513 (URN)10.3390/pr12071346 (DOI)001277572100001 ()2-s2.0-85199646373 (Scopus ID)
Available from: 2024-08-19 Created: 2024-08-19 Last updated: 2025-08-28Bibliographically approved
Ming, H., Yang, J., Gui, F., Jiang, L. & An, N. (2024). Few-shot nested named entity recognition. Knowledge-Based Systems, 293, Article ID 111688.
Open this publication in new window or tab >>Few-shot nested named entity recognition
Show others...
2024 (English)In: Knowledge-Based Systems, ISSN 0950-7051, E-ISSN 1872-7409, Vol. 293, article id 111688Article in journal (Refereed) Published
Abstract [en]

While Named Entity Recognition (NER) is a widely studied task, making inferences of entities with only a few labeled data has been challenging, especially for entities with nested structures commonly existing in NER datasets. Unlike flat entities, entities and their nested entities are more likely to have similar semantic feature representations, drastically increasing difficulties in classifying different entity categories. This paper posits that the few-shot nested NER task warrants its own dedicated attention and proposes a Global-Biaffine Positive-Enhanced (GBPE) framework for this new task. Within the GBPE framework, we first develop the new Global-Biaffine span representation to capture the span global dependency information for each entity span to distinguish nested entities. We then formulate a unique positive-enhanced contrastive loss function to enhance the utility of specific positive samples in contrastive learning for larger margins. Lastly, by using these enlarged margins, we obtain better margin constraints and incorporate them into the nearest neighbor inference to predict the unlabeled entities. Extensive experiments on three nested NER datasets in English, German, and Russian show that GBPE outperforms baseline models on the 1-shot and 5-shot tasks in terms of F1 score.

Place, publisher, year, edition, pages
Elsevier, 2024
Keywords
Few-shot, Nested named entity recognition, Positive-enhanced contrastive loss
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-223235 (URN)10.1016/j.knosys.2024.111688 (DOI)001222082800001 ()2-s2.0-85189309268 (Scopus ID)
Funder
The Swedish Foundation for International Cooperation in Research and Higher Education (STINT), MG2020-8848
Available from: 2024-04-19 Created: 2024-04-19 Last updated: 2025-04-24Bibliographically approved
Yang, J., Zhu, Z., Ming, H., Jiang, L. & An, N. (2024). LPNER: label prompt for few-shot nested named entity recognition. In: Vu Nguyen; Hsuan-Tien Lin (Ed.), Asian Conference on Machine Learning: 5-8 December 2024, Hanoi, Vietnam. Paper presented at 16th Asian Conference on Machine Learning, ACML 2024, Hanoi, Vietnam, December 5-8, 2024 (pp. 781-796). ML Research Press
Open this publication in new window or tab >>LPNER: label prompt for few-shot nested named entity recognition
Show others...
2024 (English)In: Asian Conference on Machine Learning: 5-8 December 2024, Hanoi, Vietnam / [ed] Vu Nguyen; Hsuan-Tien Lin, ML Research Press , 2024, p. 781-796Conference paper, Published paper (Refereed)
Abstract [en]

Few-shot Named Entity Recognition (NER) aims to identify named entities using very little annotated data. Recently, prompt-based few-shot NER methods have demonstrated significant effectiveness. However, most existing methods employ multi-round prompts, which significantly increase time and computational costs. Furthermore, current single-round prompt methods are mainly designed for flat NER tasks and are not effective in handling nested NER tasks. Additionally, these methods do not to fully utilize the semantic information of entity labels through prompts. To address these challenges, we propose a novel Label-Prompt-based few-shot nested NER method named LPNER, which not only handles nested NER tasks but also efficiently extracts semantic information of entities through label prompts, thereby achieving more efficient and accurate NER. LPNER first designs a specialized prompt based on a span strategy to enhance label semantics and effectively combines multiple span representations using special mark to obtain enhanced span representations integrated with label semantics. Then, entity prototypes are constructed through prototype network for classifying candidate entity spans. We conducted extensive experiments on five nested datasets: ACE04, ACE05, GENIA, GermEval, and NEREL. In 1-shot and 5-shot tasks, LPNER’s F1 scores mostly outperform baseline models.

Place, publisher, year, edition, pages
ML Research Press, 2024
Series
Proceedings of Machine Learning Research, E-ISSN 2640-3498 ; 260
Keywords
Few-shot learning, Label semantics, Nested named recognition, Prompt learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-236501 (URN)2-s2.0-85219527546 (Scopus ID)
Conference
16th Asian Conference on Machine Learning, ACML 2024, Hanoi, Vietnam, December 5-8, 2024
Available from: 2025-03-18 Created: 2025-03-18 Last updated: 2025-03-18Bibliographically approved
Tran, K.-T., Hy, T. S., Jiang, L. & Vu, X.-S. (2024). MGLEP: multimodal graph learning for modeling emerging pandemics with big data. Scientific Reports, 14(1), Article ID 16377.
Open this publication in new window or tab >>MGLEP: multimodal graph learning for modeling emerging pandemics with big data
2024 (English)In: Scientific Reports, E-ISSN 2045-2322, Vol. 14, no 1, article id 16377Article in journal (Refereed) Published
Abstract [en]

Accurate forecasting and analysis of emerging pandemics play a crucial role in effective public health management and decision-making. Traditional approaches primarily rely on epidemiological data, overlooking other valuable sources of information that could act as sensors or indicators of pandemic patterns. In this paper, we propose a novel framework, MGLEP, that integrates temporal graph neural networks and multi-modal data for learning and forecasting. We incorporate big data sources, including social media content, by utilizing specific pre-trained language models and discovering the underlying graph structure among users. This integration provides rich indicators of pandemic dynamics through learning with temporal graph neural networks. Extensive experiments demonstrate the effectiveness of our framework in pandemic forecasting and analysis, outperforming baseline methods across different areas, pandemic situations, and prediction horizons. The fusion of temporal graph learning and multi-modal data enables a comprehensive understanding of the pandemic landscape with less time lag, cheap cost, and more potential information indicators.

Place, publisher, year, edition, pages
Springer Nature, 2024
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-227969 (URN)10.1038/s41598-024-67146-y (DOI)001337302400019 ()39013976 (PubMedID)2-s2.0-85198649048 (Scopus ID)
Funder
The Swedish Foundation for International Cooperation in Research and Higher Education (STINT), MG2020-8848Knut and Alice Wallenberg Foundation
Available from: 2024-07-23 Created: 2024-07-23 Last updated: 2025-04-24Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-7788-3986

Search in DiVA

Show all publications