Umeå universitets logga

umu.sePublikationer
Ändra sökning
Avgränsa sökresultatet
1 - 35 av 35
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Ait-Mlouk, Addi
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    A Web-Based Platform for Mining and Ranking Association Rules2020Ingår i: ECIR 2020: Advances in Information Retrieval / [ed] Lecture Notes in Computer Science, vol 12036. Springer, Springer, 2020, s. 443-448Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this demo, we introduce an interactive system, which effectively applies multiple criteria analysis to rank association rules. We first use association rules techniques to explore the correlations between variables in given data (i.e., database and linked data (LD)), and secondly apply multiple criteria analysis (MCA) to select the most relevant rules according to user preferences. The developed system is flexible and allows intuitive creation and execution of different algorithms for an extensive range of advanced data analysis topics. Furthermore, we demonstrate a case study of association rule mining and ranking on road accident data.

  • 2.
    Ait-Mlouk, Addi
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    KBot: a Knowledge graph based chatBot for natural language understanding over linked data2020Ingår i: IEEE Access, E-ISSN 2169-3536, Vol. 8, s. 149220-149230Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    With the rapid progress of the semantic web, a huge amount of structured data has become available on the web in the form of knowledge bases (KBs). Making these data accessible and useful for end-users is one of the main objectives of chatbots over linked data. Building a chatbot over linked data raises different challenges, including user queries understanding, multiple knowledge base support, and multilingual aspect. To address these challenges, we first design and develop an architecture to provide an interactive user interface. Secondly, we propose a machine learning approach based on intent classification and natural language understanding to understand user intents and generate SPARQL queries. We especially process a new social network dataset (i.e., myPersonality) and add it to the existing knowledge bases to extend the chatbot capabilities by understanding analytical queries. The system can be extended with a new domain on-demand, flexible, multiple knowledge base, multilingual, and allows intuitive creation and execution of different tasks for an extensive range of topics. Furthermore, evaluation and application cases in the chatbot are provided to show how it facilitates interactive semantic data towards different real application scenarios and showcase the proposed approach for a knowledge graph and data-driven chatbot.

    Ladda ner fulltext (pdf)
    fulltext
  • 3.
    Ait-Mlouk, Addi
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Vu, Xuan-Son
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    WINFRA: A Web-Based Platform for Semantic Data Retrieval and Data Analytics2020Ingår i: Mathematics, E-ISSN 2227-7390, Vol. 8, nr 11, artikel-id 2090Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Given the huge amount of heterogeneous data stored in different locations, it needs to be federated and semantically interconnected for further use. This paper introduces WINFRA, a comprehensive open-access platform for semantic web data and advanced analytics based on natural language processing (NLP) and data mining techniques (e.g., association rules, clustering, classification based on associations). The system is designed to facilitate federated data analysis, knowledge discovery, information retrieval, and new techniques to deal with semantic web and knowledge graph representation. The processing step integrates data from multiple sources virtually by creating virtual databases. Afterwards, the developed RDF Generator is built to generate RDF files for different data sources, together with SPARQL queries, to support semantic data search and knowledge graph representation. Furthermore, some application cases are provided to demonstrate how it facilitates advanced data analytics over semantic data and showcase our proposed approach toward semantic association rules.

    Ladda ner fulltext (pdf)
    fulltext
  • 4.
    Anjomshoae, Sule
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Främling, Kary
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Visual Explanations for DNNs with Contextual Importance2021Ingår i: Explainable and Transparent AI and Multi-Agent Systems: Third International Workshop, EXTRAAMAS 2021, Virtual Event, May 3–7, 2021, Revised Selected Papers / [ed] Davide Calvaresi; Amro Najjar; Michael Winikoff; Kary Främling, Springer, 2021, Vol. 12688, s. 83-96Konferensbidrag (Refereegranskat)
    Abstract [en]

    Autonomous agents and robots with vision capabilities powered by machine learning algorithms such as Deep Neural Networks (DNNs) are taking place in many industrial environments. While DNNs have improved the accuracy in many prediction tasks, it is shown that even modest disturbances in their input produce erroneous results. Such errors have to be detected and dealt with for making the deployment of DNNs secure in real-world applications. Several explanation methods have been proposed to understand the inner workings of these models. In this paper, we present how Contextual Importance (CI) can make DNN results more explainable in an image classification task without peeking inside the network. We produce explanations for individual classifications by perturbing an input image through over-segmentation and evaluating the effect on a prediction score. Then the output highlights the most contributing segments for a prediction. Results are compared with two explanation methods, namely mask perturbation and LIME. The results for the MNIST hand-written digit dataset produced by the three methods show that CI provides better visual explainability.

  • 5.
    Anjomshoae, Sule
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Omeiza, Daniel
    Department of Computer Science, University of Oxford, United Kingdom.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Context-based image explanations for deep neural networks2021Ingår i: Image and Vision Computing, ISSN 0262-8856, E-ISSN 1872-8138, Vol. 116, artikel-id 104310Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    With the increased use of machine learning in decision-making scenarios, there has been a growing interest in explaining and understanding the outcomes of machine learning models. Despite this growing interest, existing works on interpretability and explanations have been mostly intended for expert users. Explanations for general users have been neglected in many usable and practical applications (e.g., image tagging, caption generation). It is important for non-technical users to understand features and how they affect an instance-specific prediction to satisfy the need for justification. In this paper, we propose a model-agnostic method for generating context-based explanations aiming for general users. We implement partial masking on segmented components to identify the contextual importance of each segment in scene classification tasks. We then generate explanations based on feature importance. We present visual and text-based explanations: (i) saliency map presents the pertinent components with a descriptive textual justification, (ii) visual map with a color bar graph showing the relative importance of each feature for a prediction. Evaluating the explanations using a user study (N = 50), we observed that our proposed explanation method visually outperformed existing gradient and occlusion based methods. Hence, our proposed explanation method could be deployed to explain models’ decisions to non-expert users in real-world applications.

    Ladda ner fulltext (pdf)
    fulltext
  • 6.
    Brännström, Mattias
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Aler Tubella, Andrea
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Dignum, Virginia
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Impact based fairness framework for socio-technical decision making2023Ingår i: Proceedings of the 1st workshop on fairness and bias in AIco-located with 26th european conference on artificial intelligence (ECAI 2023) / [ed] Roberta Calegari; Andrea Aler Tubella; Gabriel González Castañe; Virginia Dignum; Michela Milano, CEUR-WS , 2023Konferensbidrag (Refereegranskat)
    Abstract [en]

    Avoiding bias and understanding the consequences of artificial intelligence used in decision making is of high importance to avoid mistreatment and unintended harm. This paper aims to present an impact focused approach to model the information flow of a socio-technical decision system for analysis of bias and fairness. The framework roots otherwise abstract technical accuracy and bias measures in stakeholder effects and forms a scaffold around which further analysis of the socio-technical system and its components can be coordinated. Two example use-cases are presented and analysed.

    Ladda ner fulltext (pdf)
    fulltext
  • 7. Chen, Ye
    et al.
    Wang, Aiguo
    Ding, Huitong
    Que, Xia
    Li, Yabo
    An, Ning
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    A global learning with local preservation method for microarray data imputation2016Ingår i: Computers in Biology and Medicine, ISSN 0010-4825, E-ISSN 1879-0534, Vol. 77, s. 76-89Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Microarray data suffer from missing values for various reasons, including insufficient resolution, image noise, and experimental errors. Because missing values can hinder downstream analysis steps that require complete data as input, it is crucial to be able to estimate the missing values. In this study, we propose a Global Learning with Local Preservation method (GL2P) for imputation of missing values in microarray data. GL2P consists of two components: a local similarity measurement module and a global weighted imputation module. The former uses a local structure preservation scheme to exploit as much information as possible from the observable data, and the latter is responsible for estimating the missing values of a target gene by considering all of its neighbors rather than a subset of them. Furthermore, GL2P imputes the missing values in ascending order according to the rate of missing data for each target gene to fully utilize previously estimated values. To validate the proposed method, we conducted extensive experiments on six benchmarked microarray datasets. We compared GL2P with eight state-of-the-art imputation methods in terms of four performance metrics. The experimental results indicate that GL2P outperforms its competitors in terms of imputation accuracy and better preserves the structure of differentially expressed genes. In addition, GL2P is less sensitive to the number of neighbors than other local learning-based imputation. methods.

  • 8. Gonzalez, Roberto
    et al.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Ahmed, Mohamed
    Marciel, Miriam
    Cuevas, Ruben
    Metwalley, Hassan
    Niccolini, Saverio
    The cookie recipe: Untangling the use of cookies in the wild2017Ingår i: TMA Conference 2017: Proceedings of the 1st Network Traffic Measurement and Analysis Conference, IEEE, 2017, nr C 2014. Proceedings: LNCS 8783InformationSecurity 17th International Confe= nce, ISC 2014, 12-14 Oct. 2014, Hong Kong, China, P309 osh A., 2015, ACM Transactions on Economics and Computation, V3,=20 vakorn Suphannee, 2016, 2016 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP)I=Konferensbidrag (Refereegranskat)
    Abstract [en]

    Users online are commonly tracked using HTTP cookies when browsing on the web. To protect their privacy, users tend to use simple tools to block the activity of HTTP cookies. However, the "block all" design of tools breaks critical web services or severely limits the online advertising ecosystem. Therefore, to ease this tension, a more nuanced strategy that discerns better the intended functionality of the HTTP cookies users encounter is required. We present the first large-scale study of the use of HTTP cookies in the wild using network traces containing more than 5.6 billion HTTP requests from real users for a period of two and a half months. We first present a statistical analysis of how cookies are used. We then analyze the structure of cookies and observe that; HTTP cookies are significantly more sophisticated than the name=3Dvalue defined by the standard and assumed by researchers and developers. Based on our findings we present an algorithm that is able to extract the information included in 86% of the cookies in our dataset with an accuracy of 91.7%. Finally, we discuss the implications of our findings and provide solutions that can be used to improve the most promising privacy preserving tools.

  • 9.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Entity markup for knowledge base population2017Ingår i: Big data analytics: 5th international conference, BDA 2017, Hyderabad, India, December 12-15, 2017, proceedings / [ed] P. Krishna Reddy; Ashish Sureka; Sharma Chakravarthy; Subhash Bhalla, Springer, 2017, s. 71-89Konferensbidrag (Refereegranskat)
    Abstract [en]

    Entities (e.g. people, places, products) exist in various heterogeneous sources, such as Wikipedia, web page, and social media. Entity markup, like entity extraction, coreference resolution, and entity disambiguation, is the essential means for adding semantic value to unstructured web contents and this way enabling the linkage between unstructured and structured data and knowledge collections. A major challenge in this endeavor lies in the ambiguity of the digital contents, with context-dependent semantic and dynamic. In this paper, I introduce the main challenges of coreference resolution and named entity disambiguation. Especially, I propose practical strategies to improve entity markup. Furthermore, experimental studies are conducted to fulfill named entity disambiguation in combination with the optimized entity extraction and coreference resolution. The main goal of this paper is to analyze the significant challenges of entity markup and present insights on the proposed entity markup framework for knowledge base population. The preliminary experimental results prove the significance of improving entity markup.

  • 10.
    Jiang, Lili
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Jonsson, AnnaUmeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.Vanhée, LoïsUmeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Proceedings of Umeå's 25th Student Conference in Computing Science (USCCS 2022)2022Proceedings (redaktörskap) (Övrigt vetenskapligt)
    Abstract [en]

    The Umeå Student Conference in Computing Science (USCCS) is organized annually as part of a course given by the Computing Science department at Umeå University. The objective of the course is to give the students a practical introduction to independent research, scientific writing, and oral presentation.

    A student who participates in the course first selects a topic and a research question that they are interested in. If the topic is accepted, the student outlines a paper and composes an annotated bibliography to give a survey of the research topic. The main work consists of conducting the actual research that answers the question asked, and convincingly and clearly reporting the results in a scientific paper. Another major part of the course is multiple internal peer review meetings in which groups of students read each others' papers and give feedback to the author. This process gives valuable training in both giving and receiving criticism in a constructive manner. Altogether, the students learn to formulate and develop their own ideas in a scientific manner, in a process involving internal peer reviewing of each other's work and under supervision of the teachers, and incremental development and refinement of a scientific paper.

    Each scientific paper is submitted to USCCS through an on-line submission system, and receives reviews written by members of the Computing Science department. Based on the review, the editors of the conference proceedings (the teachers of the course) issue a decision of preliminary acceptance of the paper to each author. If, after final revision, a paper is accepted, the student is given the opportunity to present the work at the conference. The review process and the conference format aims at mimicking realistic settings for publishing and participation at scientific conferences.

    USCCS is the highlight of the course, and this year the conference received 10 submissions, which were carefully reviewed by the teachers of the course. As a result of the reviewing process, 6 submissions were accepted for presentation at the conference.

    Ladda ner fulltext (pdf)
    fulltext
  • 11.
    Jiang, Lili
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Torra, Vicenç
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Data protection and multi-database data-driven models2023Ingår i: Future Internet, E-ISSN 1999-5903, Vol. 15, nr 3, artikel-id 93Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Anonymization and data masking have effects on data-driven models. Different anonymization methods have been developed to provide a good trade-off between privacy guarantees and data utility. Nevertheless, the effects of data protection (e.g., data microaggregation and noise addition) on data integration and on data-driven models (e.g., machine learning models) built from these data are not known. In this paper, we study how data protection affects data integration, and the corresponding effects on the results of machine learning models built from the outcome of the data integration process. The experimental results show that the levels of protection that prevent proper database integration do not affect machine learning models that learn from the integrated database to the same degree. Concretely, our preliminary analysis and experiments show that data protection techniques have a lower level of impact on data integration than on machine learning models.

    Ladda ner fulltext (pdf)
    fulltext
  • 12.
    Jiang, Lili
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Torra, Vicenç
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    On the Effects of Data Protection on Multi-database Data-Driven Models2022Ingår i: Integrated Uncertainty in Knowledge Modelling and Decision Making: 9th International Symposium, IUKM 2022, Ishikawa, Japan, March 18–19, 2022, Proceedings / [ed] Katsuhiro Honda; Tomoe Entani; Seiki Ubukata; Van-Nam Huynh; Masahiro Inuiguchi, Springer, 2022, s. 226-238Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper analyses the effects of masking mechanism for privacy preservation in data-driven models (regression) with respect to database integration. Especially two data masking methods (microaggregation and rank swapping) are applied on two public datasets to evaluate the linear regression model in terms of privacy protection and prediction performance. Our preliminary experimental results show that both methods achieve a good trade-off of privacy protection and information loss. We also show that for some experiments although data integration produces some incorrect links, the linear regression model is still comparable, with respect to prediction error, to the one inferred from the original data.

    Ladda ner fulltext (pdf)
    fulltext
  • 13.
    Luan, Siyu
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för tillämpad fysik och elektronik.
    Gu, Zonghua
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för tillämpad fysik och elektronik.
    Freidovich, Leonid B.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för tillämpad fysik och elektronik. Department of Information Technologies and AI, Sirius University of Science and Technology, Sochi, Russian Federation.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Zhao, Qingling
    College of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China.
    Out-of-Distribution Detection for Deep Neural Networks with Isolation Forest and Local Outlier Factor2021Ingår i: IEEE Access, E-ISSN 2169-3536, Vol. 9, s. 132980-132989Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Deep Neural Networks (DNNs) are extensively deployed in today's safety-critical autonomous systems thanks to their excellent performance. However, they are known to make mistakes unpredictably, e.g., a DNN may misclassify an object if it is used for perception, or issue unsafe control commands if it is used for planning and control. One common cause for such unpredictable mistakes is Out-of-Distribution (OOD) input samples, i.e., samples that fall outside of the distribution of the training dataset. We present a framework for OOD detection based on outlier detection in one or more hidden layers of a DNN with a runtime monitor based on either Isolation Forest (IF) or Local Outlier Factor (LOF). Performance evaluation indicates that LOF is a promising method in terms of both the Machine Learning metrics of precision, recall, F1 score and accuracy, as well as computational efficiency during testing.

    Ladda ner fulltext (pdf)
    fulltext
  • 14.
    Luan, Siyu
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för tillämpad fysik och elektronik.
    Gu, Zonghua
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för tillämpad fysik och elektronik.
    Saremi, Amin
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för tillämpad fysik och elektronik.
    Freidovich, Leonid B.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för tillämpad fysik och elektronik.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Wan, Shaohua
    Shenzhen Institute for Advanced Study, University of Electronic Science and Technology of China, Shenzhen, China.
    Timing performance benchmarking of out-of-distribution detection algorithms2023Ingår i: Journal of Signal Processing Systems, ISSN 1939-8018, E-ISSN 1939-8115, Vol. 95, nr 12, s. 1355-1370Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In an open world with a long-tail distribution of input samples, Deep Neural Networks (DNNs) may make unpredictable mistakes for Out-of-Distribution (OOD) inputs at test time, despite high levels of accuracy obtained during model training. OOD detection can be an effective runtime assurance mechanism for safe deployment of machine learning algorithms in safety–critical applications such as medical imaging and autonomous driving. A large number of OOD detection algorithms have been proposed in recent years, with a wide range of performance metrics in terms of accuracy and execution time. For real-time safety–critical applications, e.g., autonomous driving, timing performance is of great importance in addition to accuracy. We perform a comprehensive and systematic benchmark study of multiple OOD detection algorithms in terms of both accuracy and execution time on different hardware platforms, including a powerful workstation and a resource-constrained embedded device, equipped with both CPU and GPU. We also profile and analyze the internal details of each algorithm to identify the performance bottlenecks and potential for GPU acceleration. This paper aims to provide a useful reference for the practical deployment of OOD detection algorithms for real-time safety–critical applications.

    Ladda ner fulltext (pdf)
    fulltext
  • 15.
    Ming, Hong
    et al.
    Hefei University of Technology, 420 Jade Road, Hefei City, Anhui Province, Hefei, China.
    Yang, Jiaoyun
    Hefei University of Technology, 420 Jade Road, Hefei City, Anhui Province, Hefei, China.
    Gui, Fang
    Hefei University of Technology, 420 Jade Road, Hefei City, Anhui Province, Hefei, China.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    An, Ning
    Hefei University of Technology, 420 Jade Road, Hefei City, Anhui Province, Hefei, China.
    Few-shot nested named entity recognition2024Ingår i: Knowledge-Based Systems, ISSN 0950-7051, E-ISSN 1872-7409, Vol. 293, artikel-id 111688Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    While Named Entity Recognition (NER) is a widely studied task, making inferences of entities with only a few labeled data has been challenging, especially for entities with nested structures commonly existing in NER datasets. Unlike flat entities, entities and their nested entities are more likely to have similar semantic feature representations, drastically increasing difficulties in classifying different entity categories. This paper posits that the few-shot nested NER task warrants its own dedicated attention and proposes a Global-Biaffine Positive-Enhanced (GBPE) framework for this new task. Within the GBPE framework, we first develop the new Global-Biaffine span representation to capture the span global dependency information for each entity span to distinguish nested entities. We then formulate a unique positive-enhanced contrastive loss function to enhance the utility of specific positive samples in contrastive learning for larger margins. Lastly, by using these enlarged margins, we obtain better margin constraints and incorporate them into the nearest neighbor inference to predict the unlabeled entities. Extensive experiments on three nested NER datasets in English, German, and Russian show that GBPE outperforms baseline models on the 1-shot and 5-shot tasks in terms of F1 score.

  • 16.
    Nguyen, Nhu-Van
    et al.
    L3i, La Rochelle University, La Rochelle, France; INSA-Lyon, Lyon, France.
    Vu, Xuan-Son
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Rigaud, Christophe
    L3i, La Rochelle University, La Rochelle, France.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Burie, Jean-Christophe
    L3i, La Rochelle University, La Rochelle, France.
    ICDAR 2021 Competition on Multimodal Emotion Recognition on Comics Scenes2021Ingår i: Document Analysis and Recognition – ICDAR 2021: 16th International Conference, Lausanne, Switzerland, September 5–10, 2021, Proceedings, Part I / [ed] Josep Lladós, Daniel Lopresti, Seiichi Uchida, Springer, 2021, s. 767-782Konferensbidrag (Refereegranskat)
    Abstract [en]

    The paper describes the "Multimodal Emotion Recognition on Comics scenes" competition presented at the ICDAR conference 2021. This competition aims to tackle the problem of emotion recognition of comic scenes (panels). Emotions are assigned manually by multiple annotators for each comic scene of a subset of a public large-scale dataset of golden age American comics. As a multi-modal analysis task, the competition proposes to extract the emotions of comic characters in comic scenes based on visual information, text in speech balloons or captions and the onomatopoeia. Participants were competing on CodaLab.org from December 16 th 2020 to March 31 th 2021. The challenge has attracted 145 registrants, 21 teams have joined the public test phase, and 7 teams have competed in the private test phase. In this paper we present the motivation, dataset preparation, task definition of the competition, the analysis of participant’s performance and submitted methods. We believe that the competition have drawn attention from the document analysis community in both fields of computer vision and natural language processing on the task of emotion recognition in documents.

  • 17.
    Pan, Yan
    et al.
    Hefei University of Technology, Anhui, Hefei, China.
    Yang, Jiaoyun
    Hefei University of Technology, Anhui, Hefei, China.
    Ming, Hong
    Hefei University of Technology, Anhui, Hefei, China.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    An, Ning
    Hefei University of Technology, Anhui, Hefei, China.
    Few-shot named entity recognition via Label-Attention Mechanism2023Ingår i: ICCAI '23: proceedings of the 2023 9th international conference on computing and artificial intelligence, Association for Computing Machinery (ACM), 2023, s. 466-471Konferensbidrag (Refereegranskat)
    Abstract [en]

    Few-shot named entity recognition aims to identify specific words with the support of very few labeled entities. Existing transfer-learning-based methods learn the semantic features of words in the source domain and migrate them to the target domain but ignore the different label-specific information. We propose a novel Label-Attention Mechanism (LAM) to utilize the overlooked label-specific information. LAM can separate label information from semantic features and learn how to obtain label information from a few samples through the meta-learning strategy. When transferring to the target domain, LAM replaces the source label information with the knowledge extracted from the target domain, thus improving the migration ability of the model. We conducted extensive experiments on multiple datasets, including OntoNotes, CoNLL'03, WNUT'17, GUM, and Few-Nerd, with two experimental settings. The results show that LAM is 7% better than the state-of-the-art baseline models by the absolute F1 scores.

  • 18.
    Tran, Khanh-Tung
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. AI Center, FPT Software, Hanoi, Viet Nam.
    Hy, Truong Son
    Department of Mathematics and Computer Science, Indiana State University, Terre Haute, United States.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Vu, Xuan-Son
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. DeepTensor AB, Umeå, Sweden.
    MGLEP: multimodal graph learning for modeling emerging pandemics with big data2024Ingår i: Scientific Reports, E-ISSN 2045-2322, Vol. 14, nr 1, artikel-id 16377Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Accurate forecasting and analysis of emerging pandemics play a crucial role in effective public health management and decision-making. Traditional approaches primarily rely on epidemiological data, overlooking other valuable sources of information that could act as sensors or indicators of pandemic patterns. In this paper, we propose a novel framework, MGLEP, that integrates temporal graph neural networks and multi-modal data for learning and forecasting. We incorporate big data sources, including social media content, by utilizing specific pre-trained language models and discovering the underlying graph structure among users. This integration provides rich indicators of pandemic dynamics through learning with temporal graph neural networks. Extensive experiments demonstrate the effectiveness of our framework in pandemic forecasting and analysis, outperforming baseline methods across different areas, pandemic situations, and prediction horizons. The fusion of temporal graph learning and multi-modal data enables a comprehensive understanding of the pandemic landscape with less time lag, cheap cost, and more potential information indicators.

    Ladda ner fulltext (pdf)
    fulltext
  • 19.
    Tran, Tung Khanh
    et al.
    FPT Software AI Center, FPT Software Company Limited, Viet Nam.
    Vu, Xuan-Son
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    SoBigDemicSys: a social media based monitoring system for emerging pandemics with big data2022Ingår i: Proceedings - IEEE 8th International Conference on Big Data Computing Service and Applications, BigDataService 2022, IEEE Computer Society, 2022, s. 103-107Konferensbidrag (Refereegranskat)
    Abstract [en]

    The outbreak of Covid-19 pandemic has caused millions of people infected and dead, resulting in global economy depression. Lessons learned to minimize the damage in an emerging pandemic is that timely tracking and reasonable trend prediction are required to help the society (e.g., municipality, institutions, and industries) with timely planning for efficient resource preparation and allocation. This paper presents a system to monitor the pandemic trends, analyze the correlation and impacts, predict the evolution, and visualize the prediction results to end users as social indicators. The significance lies in the fact that tracing online information collection for pandemic related prediction has less time lag, cheaper cost, and more potential information indicators.

  • 20.
    Vu, Xuan-Son
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Ait-Mlouk, Addi
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Elmroth, Erik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Graph-based Interactive Data Federation System for Heterogeneous Data Retrieval and Analytics2019Ingår i: Proceedings of The World Wide Web Conference WWW 2019, New York, NY, USA: ACM Digital Library, 2019, s. 3595-3599Konferensbidrag (Refereegranskat)
    Abstract [en]

    Given the increasing number of heterogeneous data stored in relational databases, file systems or cloud environment, it needs to be easily accessed and semantically connected for further data analytic. The potential of data federation is largely untapped, this paper presents an interactive data federation system (https://vimeo.com/ 319473546) by applying large-scale techniques including heterogeneous data federation, natural language processing, association rules and semantic web to perform data retrieval and analytics on social network data. The system first creates a Virtual Database (VDB) to virtually integrate data from multiple data sources. Next, a RDF generator is built to unify data, together with SPARQL queries, to support semantic data search over the processed text data by natural language processing (NLP). Association rule analysis is used to discover the patterns and recognize the most important co-occurrences of variables from multiple data sources. The system demonstrates how it facilitates interactive data analytic towards different application scenarios (e.g., sentiment analysis, privacyconcern analysis, community detection).

    Ladda ner fulltext (pdf)
    fulltext
  • 21.
    Vu, Xuan-Son
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Flekova, Lucie
    Amazon Research Germany, Aachen, Germany.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Gurevych, Iryna
    UKP Lab, Computer Science Department, Technische Universität Darmstadt, Germany.
    Lexical-semantic resources: yet powerful resources for automatic personality classification2018Ingår i: Proceedings of the 9th Global WordNet Conference (GWC 2018) / [ed] Francis Bond; Takayuki Kuribayashi; Christiane Fellbaum; Piek Vossen, Singapore: Nanyang Technological University (NTU) , 2018, , s. 10s. 173-182Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, we aim to reveal the impact of lexical-semantic resources, used in particular for word sense disambiguation and sense-level semantic categorization, on automatic personality classification task. While stylistic features (e.g., part-of-speech counts) have been shown their power in this task, the impact of semantics beyond targeted word lists is relatively unexplored. We propose and extract three types of lexical-semantic features, which capture high-level concepts and emotions, overcoming the lexical gap of word n-grams. Our experimental results are comparable to state-of-the-art methods, while no personality-specific resources are required.

    Ladda ner fulltext (pdf)
    fulltext
  • 22.
    Vu, Xuan-Son
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Self-adaptive privacy concern detection for user-generated content2023Ingår i: Computational linguistics and intelligent text processing: 19th International Conference on CiCLing 2018, Hanoi, Vietnam, March 18-24, 2018Revised selected papers, part 1 / [ed] Alexander Gelbukh, Springer Science+Business Media B.V., 2023, s. 153-167Konferensbidrag (Refereegranskat)
    Abstract [en]

    To protect user privacy in data analysis, a state-of-the-art strategy is differential privacy in which scientific noise is injected into the real analysis output. The noise masks individual’s sensitive information contained in the dataset. However, determining the amount of noise is a key challenge, since too much noise will destroy data utility while too little noise will increase privacy risk. Though previous research works have designed some mechanisms to protect data privacy in different scenarios, most of the existing studies assume uniform privacy concerns for all individuals. Consequently, putting an equal amount of noise to all individuals leads to insufficient privacy protection for some users, while over-protecting others. To address this issue, we propose a self-adaptive approach for privacy concern detection based on user personality. Our experimental studies demonstrate the effectiveness to address a suitable personalized privacy protection for cold-start users (i.e., without their privacy-concern information in training data).

  • 23.
    Vu, Xuan-Son
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Brändström, Anders
    Umeå universitet, Samhällsvetenskapliga fakulteten, Enheten för demografi och åldrandeforskning (CEDAR).
    Elmroth, Erik
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Personality-based Knowledge Extraction for Privacy-preserving Data Analysis2017Ingår i: K-CAP 2017: Proceedings of the Knowledge Capture Conference, Austin, TX, USA: ACM Digital Library, 2017, artikel-id 45Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, we present a differential privacy preserving approach, which extracts personality-based knowledge to serve privacy guarantee data analysis on personal sensitive data. Based on the approach, we further implement an end-to-end privacy guarantee system, KaPPA, to provide researchers iterative data analysis on sensitive data. The key challenge for differential privacy is determining a reasonable amount of privacy budget to balance privacy preserving and data utility. Most of the previous work applies unified privacy budget to all individual data, which leads to insufficient privacy protection for some individuals while over-protecting others. In KaPPA, the proposed personality-based privacy preserving approach automatically calculates privacy budget for each individual. Our experimental evaluations show a significant trade-off of sufficient privacy protection and data utility.

  • 24.
    Vu, Xuan-Son
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Nguyen, Thanh-Son
    Le, Duc-Trong
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Multimodal Review Generation with Privacy and Fairness AwarenessManuskript (preprint) (Övrigt vetenskapligt)
  • 25.
    Vu, Xuan-Son
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Nguyen, Thanh-Son
    A*STAR Artificial Intelligence Initiative, Singapore.
    Le, Duc-Trong
    University of Engineering and Technology, VNU, Vietnam.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Multimodal Review Generation with Privacy and Fairness Awareness2020Ingår i: Proceedings of the 28th International Conference on Computational Linguistics (COLING), 2020, International Committee on Computational LinguisticsInternational Committee on Computational Linguistics , 2020, s. 414-425Konferensbidrag (Refereegranskat)
    Abstract [en]

    Users express their opinions towards entities (e.g., restaurants) via online reviews which can be in diverse forms such as text, ratings, and images. Modeling reviews are advantageous for user behavior understanding which, in turn, supports various user-oriented tasks such as recommendation, sentiment analysis, and review generation. In this paper, we propose MG-PriFair, a multimodal neural-based framework, which generates personalized reviews with privacy and fairness awareness. Motivated by the fact that reviews might contain personal information and sentiment bias, we propose a novel differentially private (dp)-embedding model for training privacy guaranteed embeddings and an evaluation approach for sentiment fairness in the food-review domain. Experiments on our novel review dataset show that MG-PriFair is capable of generating plausibly long reviews while controlling the amount of exploited user data and using the least sentiment biased word embeddings. To the best of our knowledge, we are the first to bring user privacy and sentiment fairness into the review generation task. The dataset and source codes are available at https://github.com/ReML-AI/MG-PriFair.

    Ladda ner fulltext (pdf)
    fulltext
  • 26.
    Vu, Xuan-Son
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Santra, Abhishek
    Chakravarthy, Sharma
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Generic Multilayer Network Data Analysis with the Fusion of Content and Structure2019Ingår i: Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2019, Cornell University Library, arXiv.org , 2019Konferensbidrag (Refereegranskat)
    Abstract [en]

    Multi-feature data analysis (e.g., on Facebook, LinkedIn) is challenging especially if one wants to do it efficiently and retain the flexibility by choosing features of interest for analysis. Features (e.g., age, gender, relationship, political view etc.) can be explicitly given from datasets, but also can be derived from content (e.g., political view based on Facebook posts). Analysis from multiple perspectives is needed to understand the datasets (or subsets of it) and to infer meaningful knowledge. For example, the influence of age, location, and marital status on political views may need to be inferred separately (or in combination). In this paper, we adapt multilayer network (MLN) analysis, a nontraditional approach, to model the Facebook datasets, integrate content analysis, and conduct analysis, which is driven by a list of desired application based queries. Our experimental analysis shows the flexibility and efficiency of the proposed approach when modeling and analyzing datasets with multiple features.

  • 27.
    Vu, Xuan-Son
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Tran, Son N.
    ICT Discipline, University of Tasmania, Hobart, Australia.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    dpUGC: learn differentially private representation for user generated contents2023Ingår i: Computational linguistics and intelligent text processing: 20th international conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, revised selected papers, part I / [ed] Alexander Gelbukh, Springer, 2023, Vol. 13451, s. 316-331Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper firstly proposes a simple yet efficient generalized approach to apply differential privacy to text representation (i.e., word embedding). Based on it, we propose a user-level approach to learn personalized differentially private word embedding model on user generated contents (UGC). To our best knowledge, this is the first work of learning user-level differentially private word embedding model from text for sharing. The proposed approaches protect the privacy of the individual from re-identification, especially provide better trade-off of privacy and data utility on UGC data for sharing. The experimental results show that the trained embedding models are applicable for the classic text analysis tasks (e.g., regression). Moreover, the proposed approaches of learning differentially private embedding models are both framework- and dataindependent, which facilitates the deployment and sharing. The source code is available at https://github.com/sonvx/dpText.

  • 28.
    Vu, Xuan-Son
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Vu, Thanh
    Newcastle University; The Australian E-Health Research Centre, CSIRO, Australia.
    Tran, Son N.
    The University of Tasmania, Australia.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    ETNLP: A Visual-Aided Systematic Approach to Select Pre-Trained Embeddings for a Down Stream Task2019Ingår i: Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019) / [ed] Galia Angelova, Ruslan Mitkov, Ivelina Nikolova, Irina Temnikova, Incoma Ltd. , 2019, s. 1285-1294Konferensbidrag (Refereegranskat)
    Abstract [en]

    Given many recent advanced embedding models, selecting pre-trained wordembedding (a.k.a., word representation) models best fit for a specific downstream task is non-trivial. In this paper, we propose a systematic approach, called ETNLP, for extracting, evaluating, and visualizing multiple sets of pretrained word embeddings to determine which embeddings should be used in a downstream task. We demonstrate the effectiveness of the proposed approach on our pre-trained word embedding models in Vietnamese to select which models are suitable for a named entity recognition (NER) task. Specifically, we create a large Vietnamese word analogy list to evaluate and select the pre-trained embedding models for the task. We then utilize the selected embeddings for the NER task and achieve the new state-of-the-art results on the task benchmark dataset. We also apply the approach to another downstream task of privacy-guaranteed embedding selection, and show that it helps users quickly select the most suitable embeddings. In addition, we create an open-source system using the proposed systematic approach to facilitate similar studies on other NLP tasks. The source code and data are available at https://github.com/vietnlp/etnlp.

    Ladda ner fulltext (pdf)
    fulltext
  • 29. Wang, Aiguo
    et al.
    Chen, Ye
    An, Ning
    Yang, Jing
    Li, Lian
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Microarray Missing Value Imputation: A Regularized Local Learning Method2019Ingår i: IEEE/ACM Transactions on Computational Biology & Bioinformatics, ISSN 1545-5963, E-ISSN 1557-9964, Vol. 16, nr 3, s. 980-993Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Microarray experiments on gene expression inevitably generate missing values, which impedes further downstream biological analysis. Therefore, it is key to estimate the missing values accurately. Most of the existing imputation methods tend to suffer from the over-fitting problem. In this study, we propose two regularized local learning methods for microarray missing value imputation. Motivated by the grouping effect of L-2 regularization, after selecting the target gene, we train an L-2 Regularized Local Least Squares imputation model (RLLSimpute_L2) on the target gene and its neighbors to estimate the missing values of the target gene. Furthermore, RLLSimpute_L2 imputes the missing values in an ascending order based on the associated missing rate with each target gene. This contributes to fully utilizing the previously estimated values. Besides L-2, we further explore L-1 regularization and propose an L-1 Regularized Local Least Squares imputation model (RLLSimpute_L1). To evaluate their effectiveness, we conducted extensive experimental studies on six benchmark datasets covering both time series and non-time series cases. Nine state-of-the-art imputation methods are compared with RLLSimpute_L2 and RLLSimpute_L1 in terms of three performance metrics. The comparative experimental results indicate that RLLSimpute_L2 outperforms its competitors by achieving smaller imputation errors and better structure preservation of differentially expressed genes.

  • 30.
    Wang, Dong
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Enlund, Therese
    Mestro AB.
    Fors, Amanda
    Trygg, Johan
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Tysklind, Mats
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Towards delicate anomaly detection of energy consumption for buildings: enhance the performance from two levelsManuskript (preprint) (Övrigt vetenskapligt)
  • 31.
    Wang, Dong
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Enlund, Therese
    Mestro AB, Stockholm, Sweden.
    Trygg, Johan
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Tysklind, Mats
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Toward Delicate Anomaly Detection of Energy Consumption for Buildings: Enhance the Performance From Two Levels2022Ingår i: IEEE Access, E-ISSN 2169-3536, Vol. 10, s. 31649-31659Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Buildings are highly energy-consuming and therefore are largely accountable for environmental degradation. Detecting anomalous energy consumption is one of the effective ways to reduce energy consumption. Besides, it can contribute to the safety and robustness of building systems since anomalies in the energy data are usually the reflection of malfunctions in building systems. As the most flexible and applicable type of anomaly detection approach, unsupervised anomaly detection has been implemented in several studies for building energy data. However, no studies have investigated the joint influence of data structures and algorithms’ mechanisms on the performance of unsupervised anomaly detection for building energy data. Thus, we put forward a novel workflow based on two levels, data structure level and algorithm mechanism level, to effectively detect the imperceptible anomalies in the energy consumption profiles of buildings. The proposed workflow was implemented in a case study for identifying the anomalies in three real-world energy consumption datasets from two types of commercial buildings. Two aims were achieved through the case study. First, it precisely detected the contextual anomalies concealed beneath the time variation of the energy consumption profiles of the three buildings. The performance in terms of areas under the precision-recall curves (AUC_PR) for the three given datasets were 0.989, 0.941, and 0.957, respectively. Second, more broadly, the joint effect of the two levels was examined. On the data level, all four detectors on the contextualized data were superior to their counterparts on the original data. On the algorithm level, there was a consistent ranking of detectors regarding their detecting performances on the contextualized data. The consistent ranking suggests that local approaches outperform global approaches in the scenarios where the goal is to detect the instances deviating from their contextual neighbors rather than the rest of the entire data.

    Ladda ner fulltext (pdf)
    fulltext
  • 32.
    Wang, Dong
    et al.
    Department of Water Management, Faculty of Civil Engineering and Geosciences, Delft University of Technology, Delft, Netherlands.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kjellander, Måns
    Umeå Energi, Umeå, Sweden.
    Weidemann, Eva
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen. Umeå Energi, Umeå, Sweden.
    Trygg, Johan
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Tysklind, Mats
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    A novel data mining framework to investigate causes of boiler failures in waste-to-energy plants2024Ingår i: Processes, ISSN 2227-9717, Vol. 12, nr 7, artikel-id 1346Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Examining boiler failure causes is crucial for thermal power plant safety and profitability. However, traditional approaches are complex and expensive, lacking precise operational insights. Although data-driven approaches hold substantial potential in addressing these challenges, there is a gap in systematic approaches for investigating failure root causes with unlabeled data. Therefore, we proffered a novel framework rooted in data mining methodologies to probe the accountable operational variables for boiler failures. The primary objective was to furnish precise guidance for future operations to proactively prevent similar failures. The framework was centered on two data mining approaches, Principal Component Analysis (PCA) + K-means and Deep Embedded Clustering (DEC), with PCA + K-means serving as the baseline against which the performance of DEC was evaluated. To demonstrate the framework’s specifics, a case study was performed using datasets obtained from a waste-to-energy plant in Sweden. The results showed the following: (1) The clustering outcomes of DEC consistently surpass those of PCA + K-means across nearly every dimension. (2) The operational temperature variables T-BSH3rm, T-BSH2l, T-BSH3r, T-BSH1l, T-SbSH3, and T-BSH1r emerged as the most significant contributors to the failures. It is advisable to maintain the operational levels of T-BSH3rm, T-BSH2l, T-BSH3r, T-BSH1l, T-SbSH3, and T-BSH1r around 527 °C, 432 °C, 482 °C, 338 °C, 313 °C, and 343 °C respectively. Moreover, it is crucial to prevent these values from reaching or exceeding 594 °C, 471 °C, 537 °C, 355 °C, 340 °C, and 359 °C for prolonged durations. The findings offer the opportunity to improve future operational conditions, thereby extending the overall service life of the boiler. Consequently, operators can address faulty tubes during scheduled annual maintenance without encountering failures and disrupting production.

    Ladda ner fulltext (pdf)
    fulltext
  • 33.
    Wang, Dong
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Kjellander, Måns
    Umeå Energi AB, Umeå, Sweden.
    Weidemann, Eva
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen. Umeå Energi AB, Umeå, Sweden.
    Trygg, Johan
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Tysklind, Mats
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Investigation into causes of boiler failures in waste-to-energy plants with a coupled engineering and data mining solutionManuskript (preprint) (Övrigt vetenskapligt)
  • 34.
    Wang, Dong
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Thunéll, Sven
    Vakin, Umeå, Sweden.
    Lindberg, Ulrika
    Vakin, Umeå, Sweden.
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Trygg, Johan
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Tysklind, Mats
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Towards better process management in wastewater treatment plants: Process analytics based on SHAP values for tree-based machine learning methods2022Ingår i: Journal of Environmental Management, ISSN 0301-4797, E-ISSN 1095-8630, Vol. 301, artikel-id 113941Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Understanding the mechanisms of pollutant removal in Wastewater Treatment Plants (WWTPs) is crucial for controlling effluent quality efficiently. However, the numerous treatment units, operational factors, and the underlying interactions between these units and factors usually obfuscate the comprehensive and precise understanding of the processes. We have previously proposed a machine learning (ML) framework to uncover complex cause-and-effect relationships in WWTPs. However, only one interpretable ML model, Random forest (RF), was studied and the interpretation method was not granular enough to reveal very detailed relationships between operational factors and effluent parameters. Thus, in this paper, we present an upgraded framework involving three interpretable tree-based models (RF, XGboost and LightGBM), three metrics (R2, Root mean squared error (RMSE), and Mean absolute error (MAE)) and a more advanced interpretation system SHapley Additive exPlanations (SHAP). Details of the framework are provided along with a demonstration of its practical applicability based on a case study of the Umeå WWTP in Sweden. Results show that, for both labels TSSe (Total suspended solids in effluent) and PO4e (Phosphate in effluent), the XGBoost models are optimal whereas the RF models are the least optimal, due to overfitting and polarized fitting. This study has yielded multiple new and significant findings with respect to the control of TSSe and PO4e in the Umeå WWTP and other similarly configured WWTPs. Additionally, this study has produced two important generic findings relating to ML applications for WWTPs (or even other process industries) in terms of cause-and-effect investigations. First, the model comparison should be carried out from multiple perspectives to ensure that underlying details are fully revealed and examined. Second, using a precise, robust, and granular (feature attribution available for individual instances) explanation method can bring extra insight into both model comparison and model interpretation. SHAP is recommended as we found it to be of great value in this study.

    Ladda ner fulltext (pdf)
    fulltext
  • 35.
    Wang, Dong
    et al.
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Thunéll, Sven
    Lindberg, Ulrika
    Jiang, Lili
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap.
    Trygg, Johan
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Tysklind, Mats
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    Souihi, Nabil
    Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Kemiska institutionen.
    A machine learning framework to improve effluent quality control in wastewater treatment plants2021Ingår i: Science of the Total Environment, ISSN 0048-9697, E-ISSN 1879-1026, Vol. 784, artikel-id 147138Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Due to the intrinsic complexity of wastewater treatment plant (WWTP) processes, it is always challenging to respond promptly and appropriately to the dynamic process conditions in order to ensure the quality of the effluent, especially when operational cost is a major concern. Machine Learning (ML) methods have therefore been used to model WWTP processes in order to avoid various shortcomings of conventional mechanistic models. However, to the best of the authors' knowledge, no ML applications have focused on investigating how operational factors can affect effluent quality. Additionally, the time lags between process steps have always been neglected, making it difficult to explain the relationships between operational factors and effluent quality. Therefore, this paper presents a novel ML-based framework designed to improve effluent quality control in WWTPs by clarifying the relationships between operational variables and effluent parameters. The framework consists of Random Forest (RF) models, Deep Neural Network (DNN) models, Variable Importance Measure (VIM) analyses, and Partial Dependence Plot (PDP) analyses, and uses a novel approach to account for the impact of time lags between processes. Details of the framework are provided along with a demonstration of its practical applicability based on a case study of the Umeå WWTP in Sweden involving a large number of samples (105763) representing the full scale of the plant's operations. Two effluent parameters, Total Suspended Solids in effluent (TSSe) and Phosphate in effluent (PO4e), and thirty-two operational variables are studied. RF models are developed, validated using DNN models as references, and shown to be suitable for VIM and PDP analyses. VIM identifies the variables that most strongly influence TSSe and PO4e, while PDP elucidates their specific effects on TSSe and PO4e. The major findings are: (1) Influent temperature is the most influential variable for both TSSe and PO4e, but it affects them in different ways; (2) PO4e depends strongly on the TSS in aeration basins – higher TSS concentrations in aeration basins generally promote PO4 removal, but excess TSS can have negative effects; (3) In general, the impact of TSS in aeration basins on TSSe and PO4e increases with the distances of the basin from the merging outlet, so more attention should be paid to the TSS concentration in the third or fourth aeration basins than the first and second ones; (4) Returning excessive amounts of sludge through the second return sludge pipe should be avoided because of its adverse impact on TSSe removal. These results could support the development of more advanced control strategies to increase control precision and reduce running costs in the Umeå WWTP and other similarly configured WWTPs. The framework could also be applied to other parameters in WWTPs and industrial processes in general if sufficient high-resolution data are available.

    Ladda ner fulltext (pdf)
    fulltext
1 - 35 av 35
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf