Umeå University's logo

umu.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 35) Show all publications
Volodina, E., Alfter, D., Dobnik, S., Lindström Tiedemann, T., Sánchez, R. M., Szawerna, M. I. & Vu, X.-S. (2024). Introduction. In: Elena Volodina; David Alfter; Simon Dobnik; Therese Lindström Tiedemann; Ricardo Muñoz Sánchez; Maria Irena Szawerna; Xuan-Son Vu (Ed.), Proceedings of the workshop on computational approaches to language data pseudonymization (CALD-pseudo 2024): . Paper presented at 1st Workshop on Computational Approaches to Language Data Pseudonymization, CALD-pseudo 2024 (pp. ii-iii). Paper presented at 1st Workshop on Computational Approaches to Language Data Pseudonymization, CALD-pseudo 2024. Association for Computational Linguistics
Open this publication in new window or tab >>Introduction
Show others...
2024 (English)In: Proceedings of the workshop on computational approaches to language data pseudonymization (CALD-pseudo 2024) / [ed] Elena Volodina; David Alfter; Simon Dobnik; Therese Lindström Tiedemann; Ricardo Muñoz Sánchez; Maria Irena Szawerna; Xuan-Son Vu, Association for Computational Linguistics, 2024, p. ii-iiiChapter in book (Other academic)
Place, publisher, year, edition, pages
Association for Computational Linguistics, 2024
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-223761 (URN)2-s2.0-85190584439 (Scopus ID)9798891760851 (ISBN)
Conference
1st Workshop on Computational Approaches to Language Data Pseudonymization, CALD-pseudo 2024
Note

Conference:

CALD-pseudo 2024 - Workshop on Computational Approaches to Language Data Pseudonymization, 1st Workshop on Computational Approaches to Language Data Pseudonymization, CALD-pseudo 2024, St. Julian's, 21 March 2024.

Available from: 2024-05-06 Created: 2024-05-06 Last updated: 2024-05-06Bibliographically approved
Tran, K.-T., Vu, X.-S., Nguyen, K. & Nguyen, H. D. (2024). NeuProNet: neural profiling networks for sound classification. Neural Computing & Applications, 36(11), 5873-5887
Open this publication in new window or tab >>NeuProNet: neural profiling networks for sound classification
2024 (English)In: Neural Computing & Applications, ISSN 0941-0643, E-ISSN 1433-3058, Vol. 36, no 11, p. 5873-5887Article in journal (Refereed) Published
Abstract [en]

Real-world sound signals exhibit various aspects of grouping and profiling behaviors, such as being recorded from identical sources, having similar environmental settings, or encountering related background noises. In this work, we propose novel neural profiling networks (NeuProNet) capable of learning and extracting high-level unique profile representations from sounds. An end-to-end framework is developed so that any backbone architectures can be plugged in and trained, achieving better performance in any downstream sound classification tasks. We introduce an in-batch profile grouping mechanism based on profile awareness and attention pooling to produce reliable and robust features with contrastive learning. Furthermore, extensive experiments are conducted on multiple benchmark datasets and tasks to show that neural computing models under the guidance of our framework gain significant performance gaps across all evaluation tasks. Particularly, the integration of NeuProNet surpasses recent state-of-the-art (SoTA) approaches on UrbanSound8K and VocalSound datasets with statistically significant improvements in benchmarking metrics, up to 5.92% in accuracy compared to the previous SoTA method and up to 20.19% compared to baselines. Our work provides a strong foundation for utilizing neural profiling for machine learning tasks.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
Audio classification, Deep learning, Neural profiling network, Signal processing
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-220001 (URN)10.1007/s00521-023-09361-8 (DOI)001152242400004 ()2-s2.0-85182479547 (Scopus ID)
Available from: 2024-01-31 Created: 2024-01-31 Last updated: 2024-05-07Bibliographically approved
Szawerna, M. I., Dobnik, S., Lindström Tiedemann, T., Muñoz Sánchez, R., Vu, X.-S. & Volodina, E. (2024). Pseudonymization categories across domain boundaries. In: Nicoletta Calzolari; Min-Yen Kan; Veronique Hoste; Alessandro Lenci; Sakriani Sakti; Nianwen Xue (Ed.), Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024): . Paper presented at The 2024 joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024), Torino, Italy, May 20-25, 2024 (pp. 13303-13314). ELRA Language Resource Association
Open this publication in new window or tab >>Pseudonymization categories across domain boundaries
Show others...
2024 (English)In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) / [ed] Nicoletta Calzolari; Min-Yen Kan; Veronique Hoste; Alessandro Lenci; Sakriani Sakti; Nianwen Xue, ELRA Language Resource Association , 2024, p. 13303-13314Conference paper, Published paper (Refereed)
Abstract [en]

Linguistic data, a component critical not only for research in a variety of fields but also for the development of various Natural Language Processing (NLP) applications, can contain personal information. As a result, its accessibility is limited, both from a legal and an ethical standpoint. One of the solutions is the pseudonymization of the data. Key stages of this process include the identification of sensitive elements and the generation of suitable surrogates in a way that the data is still useful for the intended task. Within this paper, we conduct an analysis of tagsets that have previously been utilized in anonymization and pseudonymization. We also investigate what kinds of Personally Identifiable Information (PII) appear in various domains. These reveal that none of the analyzed tagsets account for all of the PII types present cross-domain at the level of detailedness seemingly required for pseudonymization. We advocate for a universal system of tags for categorizing PIIs leading up to their replacement. Such categorization could facilitate the generation of grammatically, semantically, and sociolinguistically appropriate surrogates for the kinds of information that are considered sensitive in a given domain, resulting in a system that would enable dynamic pseudonymization while keeping the texts readable and useful for future research in various fields.

Place, publisher, year, edition, pages
ELRA Language Resource Association, 2024
Series
International conference on computational linguistics, ISSN 2951-2093
Keywords
anonymization, deidentification, privacy, pseudonymization, universal tagset
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:umu:diva-226956 (URN)2-s2.0-85195988143 (Scopus ID)978-2-493814-10-4 (ISBN)
Conference
The 2024 joint international conference on computational linguistics, language resources and evaluation (LREC-COLING 2024), Torino, Italy, May 20-25, 2024
Note

Also part of series: LREC proceedings, ISBN: 2522-2686

Available from: 2024-06-25 Created: 2024-06-25 Last updated: 2024-06-25Bibliographically approved
Hatefi, A., Vu, X.-S., Bhuyan, M. H. & Drewes, F. (2023). ADCluster: Adaptive Deep Clustering for unsupervised learning from unlabeled documents. In: Mourad Abbas; Abed Alhakim Freihat (Ed.), Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023): . Paper presented at 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023), Online, December 16-17, 2023. (pp. 68-77). Association for Computational Linguistics
Open this publication in new window or tab >>ADCluster: Adaptive Deep Clustering for unsupervised learning from unlabeled documents
2023 (English)In: Proceedings of the 6th International Conference on Natural Language and Speech Processing (ICNLSP 2023) / [ed] Mourad Abbas; Abed Alhakim Freihat, Association for Computational Linguistics, 2023, p. 68-77Conference paper, Published paper (Refereed)
Abstract [en]

We introduce ADCluster, a deep document clustering approach based on language models that is trained to adapt to the clustering task. This adaptability is achieved through an iterative process where K-Means clustering is applied to the dataset, followed by iteratively training a deep classifier with generated pseudo-labels – an approach referred to as inner adaptation. The model is also able to adapt to changes in the data as new documents are added to the document collection. The latter type of adaptation, outer adaptation, is obtained by resuming the inner adaptation when a new chunk of documents has arrived. We explore two outer adaptation strategies, namely accumulative adaptation (training is resumed on the accumulated set of all documents) and non-accumulative adaptation (training is resumed using only the new chunk of data). We show that ADCluster outperforms established document clustering techniques on medium and long-text documents by a large margin. Additionally, our approach outperforms well-established baseline methods under both the accumulative and non-accumulative outer adaptation scenarios.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2023
Keywords
deep clustering, adaptive, deep learning, unsupervised, data stream
National Category
Computer Sciences
Research subject
Computer Science; computational linguistics
Identifiers
urn:nbn:se:umu:diva-220260 (URN)
Conference
6th International Conference on Natural Language and Speech Processing (ICNLSP 2023), Online, December 16-17, 2023.
Available from: 2024-01-31 Created: 2024-01-31 Last updated: 2024-07-02Bibliographically approved
Vu, X.-S., Tran, S. N. & Jiang, L. (2023). dpUGC: learn differentially private representation for user generated contents. In: Alexander Gelbukh (Ed.), Computational linguistics and intelligent text processing: 20th international conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, revised selected papers, part I. Paper presented at 20th International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France, April 7-13, 2019. (pp. 316-331). Springer, 13451
Open this publication in new window or tab >>dpUGC: learn differentially private representation for user generated contents
2023 (English)In: Computational linguistics and intelligent text processing: 20th international conference, CICLing 2019, La Rochelle, France, April 7–13, 2019, revised selected papers, part I / [ed] Alexander Gelbukh, Springer, 2023, Vol. 13451, p. 316-331Conference paper, Published paper (Refereed)
Abstract [en]

This paper firstly proposes a simple yet efficient generalized approach to apply differential privacy to text representation (i.e., word embedding). Based on it, we propose a user-level approach to learn personalized differentially private word embedding model on user generated contents (UGC). To our best knowledge, this is the first work of learning user-level differentially private word embedding model from text for sharing. The proposed approaches protect the privacy of the individual from re-identification, especially provide better trade-off of privacy and data utility on UGC data for sharing. The experimental results show that the trained embedding models are applicable for the classic text analysis tasks (e.g., regression). Moreover, the proposed approaches of learning differentially private embedding models are both framework- and dataindependent, which facilitates the deployment and sharing. The source code is available at https://github.com/sonvx/dpText.

Place, publisher, year, edition, pages
Springer, 2023
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 13451
Keywords
Private word embedding, Differential privacy, UGC
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:umu:diva-160887 (URN)10.1007/978-3-031-24337-0_23 (DOI)2-s2.0-85149907226 (Scopus ID)978-3-031-24336-3 (ISBN)978-3-031-24337-0 (ISBN)
Conference
20th International Conference on Computational Linguistics and Intelligent Text Processing, La Rochelle, France, April 7-13, 2019.
Note

Originally included in thesis in manuscript form. 

Available from: 2019-06-25 Created: 2019-06-25 Last updated: 2023-03-28Bibliographically approved
Volodina, E., Dobnik, S., Lindström Tiedemann, T. & Vu, X.-S. (2023). Grandma Karl is 27 years old - research agenda for pseudonymization of research data. In: Proceedings - IEEE 9th International Conference on Big Data Computing Service and Applications, BigDataService 2023: . Paper presented at 9th IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2023, 17-20 July 2023, Athens, Greece (pp. 229-233). IEEE
Open this publication in new window or tab >>Grandma Karl is 27 years old - research agenda for pseudonymization of research data
2023 (English)In: Proceedings - IEEE 9th International Conference on Big Data Computing Service and Applications, BigDataService 2023, IEEE, 2023, p. 229-233Conference paper, Published paper (Refereed)
Abstract [en]

Accessibility of research data is critical for advances in many research fields, but textual data often cannot be shared due to the personal and sensitive information which it contains, e.g names or political opinions. General Data Protection Regulation (GDPR) suggests pseudonymization as a solution to secure open access to research data, but we need to learn more about pseudonymization as an approach before adopting it for manipulation of research data. This paper outlines a research agenda within pseudonymization, namely need of studies into the effects of pseudonymization on unstructured data in relation to e.g. readability and language assessment, as well as the effectiveness of pseudonymization as a way of protecting writer identity, while also exploring different ways of developing context-sensitive algorithms for detection, labelling and replacement of personal information in unstructured data. The recently granted project on pseudonymization 'Grandma Karl is 27 years old'1 addresses exactly those challenges. 1.https://spraakbanken.gu.se/en/projects/mormor-karl

Place, publisher, year, edition, pages
IEEE, 2023
Keywords
natural language processing, pseudonymization
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-215235 (URN)10.1109/BigDataService58306.2023.00047 (DOI)2-s2.0-85173010164 (Scopus ID)9798350333794 (ISBN)9798350335347 (ISBN)
Conference
9th IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2023, 17-20 July 2023, Athens, Greece
Funder
Swedish Research Council, 2022-02311Swedish Research Council, 2023-2029
Available from: 2023-10-17 Created: 2023-10-17 Last updated: 2023-10-17Bibliographically approved
Vu, X.-S., Ma, M. & Bhuyan, M. H. (2023). MetaVSID: a robust meta-reinforced learning approach for VSI-DDoS detection on the edge. IEEE Transactions on Network and Service Management, 20(2), 1625-1643
Open this publication in new window or tab >>MetaVSID: a robust meta-reinforced learning approach for VSI-DDoS detection on the edge
2023 (English)In: IEEE Transactions on Network and Service Management, E-ISSN 1932-4537, Vol. 20, no 2, p. 1625-1643Article in journal (Refereed) Published
Abstract [en]

The explosive growth of end devices that generate massive amounts of data requires close-proximity computing resources for processing at the network’s edge. Having geographic distributions and limited resources of edge nodes or servers opens several doors for attackers to exploit them primarily to the detriment of deployed services; one of the recent attacks is Very Short Intermittent Distributed Denial of Services (VSI-DDoS). Deep learning-based models have been developed to detect and mitigate such attacks but cause the degrading quality of models due to covariate shifts when deployed in real-world environments. Therefore, we propose a new approach, called MetaVSID, to detect VSI-DDoS attacks in edge clouds using meta-reinforcement learning followed by ensemble learning to increase the robustness of the model in detecting VSI-DDoS attacks early. The proposed model can capture dynamic patterns of VSI-DDoS attacks, from which it identifies manipulated services and increase service availability when covariate shifts at deployment time. We carry out extensive experiments to validate the MetaVSID using both testbed and benchmark datasets. Via the meta-reinforced downsampling process, the proposed method improves sample efficiency, leading to cost-effective policies. Moreover, the optimized policies are generalized to adapt to dynamic changes in the training distribution. Our experimental results demonstrate that MetaVSID stably achieves better performance in multiple evaluation settings with the difference from baseline models from 1.5% to 7.5% in terms of AUC for both VSI-DDoS and DDoS detection, especially under covariate shift settings.

Place, publisher, year, edition, pages
IEEE, 2023
Keywords
meta-learning, vsid-ddos, edge clouds
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-198843 (URN)10.1109/TNSM.2022.3200924 (DOI)2-s2.0-85137570078 (Scopus ID)
Funder
The Swedish Foundation for International Cooperation in Research and Higher Education (STINT)
Available from: 2022-08-25 Created: 2022-08-25 Last updated: 2024-07-04Bibliographically approved
Nguyen, T. T., Pham, V.-Q. H., Le, D.-T., Vu, X.-S., Deligianni, F. & Nguyen, H. D. (2023). Multimodal machine learning for mental disorder detection: a scoping review. In: 27th international conference on knowledge based and intelligent information and engineering sytems (KES 2023): . Paper presented at 27th International Conference on Knowledge Based and Intelligent Information and Engineering Sytems (KES 2023), Greece, Athens, 6-8 September, 2023 (pp. 1458-1467). Elsevier, 225
Open this publication in new window or tab >>Multimodal machine learning for mental disorder detection: a scoping review
Show others...
2023 (English)In: 27th international conference on knowledge based and intelligent information and engineering sytems (KES 2023), Elsevier, 2023, Vol. 225, p. 1458-1467Conference paper, Published paper (Refereed)
Abstract [en]

Recent advancements in machine learning and multimedia technologies have paved new ways for automatic medical diagnosis. In mental health, multimodal inputs such as visual and audible sensing data are promising to investigate the underlying mechanisms of many conditions, such as depression and bipolar disorders. With the increasing burden on healthcare systems, timely diagnosis of mental diseases using multiple modalities might benefit millions of people worldwide. This scoping review provides an exploratory overview of recent multimodal machine learning approaches for mental disorder screening. We also discuss a generalised end-to-end multimodal machine learning pipeline for future research and development of multimodal disease detection.

Place, publisher, year, edition, pages
Elsevier, 2023
Series
Procedia Computer Science, E-ISSN 1877-0509
Keywords
Bipolar disorders, depression, mental disorder diagnosis, multimodal machine learning, stress disorders
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-220482 (URN)10.1016/j.procs.2023.10.134 (DOI)2-s2.0-85183548944 (Scopus ID)
Conference
27th International Conference on Knowledge Based and Intelligent Information and Engineering Sytems (KES 2023), Greece, Athens, 6-8 September, 2023
Available from: 2024-02-19 Created: 2024-02-19 Last updated: 2024-02-19Bibliographically approved
Tran, K.-T., Hoang, T., Nguyen, D. K., Nguyen, H. D. & Vu, X.-S. (2023). Personalization for robust voice pathology detection in sound waves. In: Proceedings of the annual conference of the international speech communication association, INTERSPEECH: . Paper presented at 24th International Speech Communication Association, Interspeech 2023, Dublin, August 20-24, 2023 (pp. 1708-1712). International Speech Communication Association
Open this publication in new window or tab >>Personalization for robust voice pathology detection in sound waves
Show others...
2023 (English)In: Proceedings of the annual conference of the international speech communication association, INTERSPEECH, International Speech Communication Association , 2023, p. 1708-1712Conference paper, Published paper (Refereed)
Abstract [en]

Automatic voice pathology detection is promising for noninvasive screening and early intervention using sound signals. Nevertheless, existing methods are susceptible to covariate shifts due to background noises, human voice variations, and data selection biases leading to severe performance degradation in real-world scenarios. Hence, we propose a non-invasive framework that contrastively learns personalization from sound waves as a pre-train and predicts latent-spaced profile features through semi-supervised learning. It allows all subjects from various distributions (e.g., regionality, gender, age) to benefit from personalized predictions for robust voice pathology in a privacy-fulfilled manner. We extensively evaluate the framework on four real-world respiratory illnesses datasets, including Coswara, COUGHVID, ICBHI, and our private dataset - ASound under multiple covariate shift settings (i.e., cross-dataset), improving up to 4.12% in overall performance.

Place, publisher, year, edition, pages
International Speech Communication Association, 2023
Series
Interspeech, ISSN 2308-457X, E-ISSN 1990-9772
Keywords
covariate shift, robust voice pathology detection
National Category
Language Technology (Computational Linguistics) Computer Sciences
Identifiers
urn:nbn:se:umu:diva-214779 (URN)10.21437/Interspeech.2023-1332 (DOI)2-s2.0-85171525230 (Scopus ID)
Conference
24th International Speech Communication Association, Interspeech 2023, Dublin, August 20-24, 2023
Available from: 2023-10-23 Created: 2023-10-23 Last updated: 2023-10-23Bibliographically approved
Nguyen, T. M. & Vu, X.-S. (2023). Privacy and trust in IoT ecosystems with big data: a survey of perspectives and challenges. In: Proceedings - IEEE 9th International Conference on Big Data Computing Service and Applications, BigDataService 2023: . Paper presented at 9th IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2023, 17-20 July 2023, Athens, Greece (pp. 215-222). IEEE
Open this publication in new window or tab >>Privacy and trust in IoT ecosystems with big data: a survey of perspectives and challenges
2023 (English)In: Proceedings - IEEE 9th International Conference on Big Data Computing Service and Applications, BigDataService 2023, IEEE, 2023, p. 215-222Conference paper, Published paper (Refereed)
Abstract [en]

The Internet of Things (IoT) has become a vital part of our daily lives, enabling interconnectedness between various devices and systems. As the amount of data generated by IoT devices and systems continues to increase immensely, privacy and security concerns have emerged as a significant challenge for researchers and enterprises. Although we are aware of how much data IoT devices will generate per day, there is a lack of knowledge of how the collected data will be used. The privacy risks associated with data collection raise individual concerns in the IoT ecosystem. For instance, when sensitive personal information is exposed due to weak security practices, it can result in identity theft, financial fraud, or other types of cybercrime. The misuse of IoT devices also puts someone susceptible to physical risks, such as a compromised medical device leading to health complications. In this paper, we introduce the definition of the next-gen IoT Ecosystem and its relations to Big Data as well as investigate privacy and security risks associated with IoT ecosystems, identify the gaps in current privacy and security practices, and present technical solutions to tackle these problems. We aim to identify challenges and raise awareness about developing secure and privacy-preserving IoT systems in the era of Big Data.

Place, publisher, year, edition, pages
IEEE, 2023
Keywords
big data, IoT, privacy-enhanced technologies
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-215241 (URN)10.1109/BigDataService58306.2023.00045 (DOI)2-s2.0-85173020243 (Scopus ID)9798350333794 (ISBN)9798350335347 (ISBN)
Conference
9th IEEE International Conference on Big Data Computing Service and Applications, BigDataService 2023, 17-20 July 2023, Athens, Greece
Available from: 2023-10-16 Created: 2023-10-16 Last updated: 2023-10-16Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-8820-2405

Search in DiVA

Show all publications