Umeå University's logo

umu.sePublications
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 262) Show all publications
Meyers, C., Saleh Sedghpour, M. R., Löfstedt, T. & Elmroth, E. (2025). A training rate and survival heuristic for inference and robustness evaluation (Trashfire). In: Proceedings of 2024 International Conference on Machine Learning and Cybernetics: . Paper presented at 2024 International Conference on Machine Learning and Cybernetics (ICMLC),Miyazaki, Japan, September 20-23, (pp. 613-623). IEEE
Open this publication in new window or tab >>A training rate and survival heuristic for inference and robustness evaluation (Trashfire)
2025 (English)In: Proceedings of 2024 International Conference on Machine Learning and Cybernetics, IEEE, 2025, p. 613-623Conference paper, Published paper (Refereed)
Abstract [en]

Machine learning models—deep neural networks in particular—have performed remarkably well on benchmark datasets across a wide variety of domains. However, the ease of finding adversarial counter-examples remains a persistent problem when training times are measured in hours or days and the time needed to find a successful adversarial counter-example is measured in seconds. Much work has gone into generating and defending against these adversarial counter-examples, however the relative costs of attacks and defences are rarely discussed. Additionally, machine learning research is almost entirely guided by test/train metrics, but these would require billions of samples to meet industry standards. The present work addresses the problem of understanding and predicting how particular model hyper-parameters influence the performance of a model in the presence of an adversary. The proposed approach uses survival models, worst-case examples, and a cost-aware analysis to precisely and accurately reject a particular model change during routine model training procedures rather than relying on real-world deployment, expensive formal verification methods, or accurate simulations of very complicated systems (e.g., digitally recreating every part of a car or a plane). Through an evaluation of many pre-processing techniques, adversarial counter-examples, and neural network configurations, the conclusion is that deeper models do offer marginal gains in survival times compared to more shallow counterparts. However, we show that those gains are driven more by the model inference time than inherent robustness properties. Using the proposed methodology, we show that ResNet is hopelessly insecure against even the simplest of white box attacks.

Place, publisher, year, edition, pages
IEEE, 2025
Series
Proceedings (International Conference on Machine Learning and Cybernetics), ISSN 2160-133X, E-ISSN 2160-1348
Keywords
Machine Learning, Computer Vision, Neural Networks, Adversarial AI, Trustworthy AI
National Category
Artificial Intelligence Security, Privacy and Cryptography Computer Sciences
Identifiers
urn:nbn:se:umu:diva-237109 (URN)10.1109/ICMLC63072.2024.10935101 (DOI)2-s2.0-105002274020 (Scopus ID)9798331528041 (ISBN)9798331528058 (ISBN)
Conference
2024 International Conference on Machine Learning and Cybernetics (ICMLC),Miyazaki, Japan, September 20-23,
Funder
Knut and Alice Wallenberg Foundation, 2019.0352eSSENCE - An eScience Collaboration
Available from: 2025-04-02 Created: 2025-04-02 Last updated: 2025-05-19Bibliographically approved
Seo, E. & Elmroth, E. (2025). Pioneering eco-efficiency in cloud computing: a carbon-conscious reinforcement learning approach to federated learning [Letter to the editor]. IEEE Internet of Things Journal, 12(7), 8958-8979
Open this publication in new window or tab >>Pioneering eco-efficiency in cloud computing: a carbon-conscious reinforcement learning approach to federated learning
2025 (English)In: IEEE Internet of Things Journal, ISSN 2327-4662, Vol. 12, no 7, p. 8958-8979Article in journal, Letter (Refereed) Published
Abstract [en]

In response to the growing emphasis on sustainability in federated learning (FL), this research introduces a dynamic, dual-objective optimization framework called Carbon-Conscious Federated Reinforcement Learning (CCFRL). By leveraging Reinforcement Learning (RL), CCFRL continuously adapts client allocation and resource usage in real-time, optimizing both carbon efficiency and model performance. Unlike static or greedy methods that prioritize short-term carbon constraints, existing approaches often suffer from either degrading model performance by excluding high-quality, energy-intensive clients or failing to adequately balance carbon emissions with long-term efficiency. CCFRL addresses these limitations by taking a more sustainable method, balancing immediate resource needs with long-term sustainability, and ensuring that energy consumption and carbon emissions are minimized without compromising model quality, even with non-IID (non-independent and identically distributed) and large-scale datasets. We overcome the shortcomings of existing methods by integrating advanced state representations, adaptive exploration and exploitation transitions, and stagnating detection using t-tests to better manage real-world data heterogeneity and complex, non-linear datasets. Extensive experiments demonstrate that CCFRL significantly reduces both energy consumption and carbon emissions while maintaining or enhancing performance. With up to a 61.78% improvement in energy conservation and a 64.23% reduction in carbon emissions, CCFRL proves the viability of aligning resource management with sustainability goals, paving the way for a more environmentally responsible future in cloud computing.

Place, publisher, year, edition, pages
IEEE, 2025
Keywords
Carbon, Cloud Computing, Reinforcement Learning, Federated Learning
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Systems Analysis; Computer Science
Identifiers
urn:nbn:se:umu:diva-236166 (URN)10.1109/JIOT.2024.3504260 (DOI)001453105600004 ()2-s2.0-105001346663 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2025-03-06 Created: 2025-03-06 Last updated: 2025-04-29Bibliographically approved
Pour-Hosseini, M. R., Abbasi, M., Salimi, A., Elmroth, E., Haghighi, H., Moradi, P. & Javadi, B. (2025). Tiny machine learning models for autonomous workload distribution across cloud-edge computing continuum. Cluster Computing, 28(6), Article ID 381.
Open this publication in new window or tab >>Tiny machine learning models for autonomous workload distribution across cloud-edge computing continuum
Show others...
2025 (English)In: Cluster Computing, ISSN 1386-7857, E-ISSN 1573-7543, Vol. 28, no 6, article id 381Article in journal (Refereed) Published
Abstract [en]

Resource management and task distribution in real-time have become increasingly challenging due to the growing use of latency-critical applications across dispersed edge-cloud infrastructures. Intelligent adaptable mechanisms capable of functioning effectively on resource-constrained edge devices and responding quickly to dynamic workload changes are required in these situations. In this work, we offer a learning-based system for autonomous resource allocation across the edge–cloud continuum that is both lightweight and scalable. Two models are presented: TinyDT, a small offline decision tree trained on state-action information retrieved from an adaptive baseline, and TinyXCS, an online rule-based classifier system that can adjust to runtime conditions. Both models are designed to operate on resource-constrained edge devices while minimizing memory overhead and inference latency. Our analysis demonstrates that TinyXCS and TinyDT outperform existing online and offline baselines in terms of throughput and latency, providing a reliable, power-efficient solution for next-generation edge intelligence.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Cloud computing, Edge computing, Internet of things (IoT), Tiny models, Workload distribution
National Category
Computer Systems Computer Sciences
Identifiers
urn:nbn:se:umu:diva-242012 (URN)10.1007/s10586-025-05289-x (DOI)001509955700002 ()2-s2.0-105008064820 (Scopus ID)
Available from: 2025-07-09 Created: 2025-07-09 Last updated: 2025-07-09Bibliographically approved
Nguyen, C. L., Seo, E., Zahid, M., Larsson, O., T. Pokorny, F. & Elmroth, E. (2025). tinyKube: a middleware for dynamic resource management in cloud-edge platforms for large-scale cloud robotics. In: The 38th IEEE/IFIP Network Operations and Management Symposium (NOMS): . Paper presented at IEEE/IFIP 2025, The 38th IEEE/IFIP Network Operations and Management Symposium (NOMS), Honolulu, HI, USA, May 12-16, 2025.
Open this publication in new window or tab >>tinyKube: a middleware for dynamic resource management in cloud-edge platforms for large-scale cloud robotics
Show others...
2025 (English)In: The 38th IEEE/IFIP Network Operations and Management Symposium (NOMS), 2025Conference paper, Oral presentation only (Refereed)
Abstract [en]

With the rise of ubiquitous networking and distributed computing, integrating robots with cloud-edge infrastructures offers significant potential. However, challenges remain in resource allocation and scheduling across distributed environments to meet robotics applications' performance demands. 

This paper introduces tinyKube, a middleware tailored for dynamic resource management across the cloud-edge platform for large-scale cloud robotics deployments. Leveraging Kubernetes for orchestration and Prometheus for monitoring, tinyKube enables unified monitoring, task dispatching, and resource provisioning across cloud-edge infrastructures.

We evaluate tinyKube using a robotic gripper application on the CloudGripper testbed in a real-world cloud-edge setup. Results demonstrate its ability to automate task dispatching and resource allocation, dynamically adapting to QoS requirements and workload variations.By simplifying resource management, tinyKube accelerates the development, testing, and deployment of large-scale cloud robotics applications, facilitating more efficient real-world implementation.

Keywords
Cloud Robotics, Cloud-Edge Infrastructure, Resource Orchestration, Performance Monitoring, Middleware
National Category
Computer Vision and Robotics (Autonomous Systems) Computer Sciences Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-234935 (URN)
Conference
IEEE/IFIP 2025, The 38th IEEE/IFIP Network Operations and Management Symposium (NOMS), Honolulu, HI, USA, May 12-16, 2025
Projects
NEST-project: Cloud-Robotics
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2025-02-16 Created: 2025-02-16 Last updated: 2025-02-17
Nguyen, C. L., Kidane, L., Vo Nguyen Le, D. & Elmroth, E. (2024). CloudResilienceML: ensuring robustness of machine learning models in dynamic cloud systems. In: Proceedings - 2024 IEEE/ACM 17th International Conference on Utility and Cloud Computing, UCC 2024, IEEE, 2024, p. 73-81: . Paper presented at The 17th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2024), University of Sharjah, Sharjah, United Arab Emirates, 16-19 December, 2024 (pp. 73-81). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>CloudResilienceML: ensuring robustness of machine learning models in dynamic cloud systems
2024 (English)In: Proceedings - 2024 IEEE/ACM 17th International Conference on Utility and Cloud Computing, UCC 2024, IEEE, 2024, p. 73-81, Institute of Electrical and Electronics Engineers (IEEE), 2024, p. 73-81Conference paper, Published paper (Refereed)
Abstract [en]

Machine Learning (ML) models play a crucial role in enabling intelligent decision-making across diverse cloud system management tasks. However, as cloud operational data evolves, shifts in data distributions can occur, leading to a gradual degradation of deployed ML models and, consequently, a reduction in the overall efficiency of cloud systems.

We introduce CloudResilienceML, a framework designed to maintain the resilience of ML models in dynamic cloud environments. CloudResilienceML includes: (1) a performance degradation detection mechanism, using dynamic programming change point detection to identify when a model needs retraining, and (2) a data valuation method to select a minimal, effective training set for retraining, reducing unnecessary overhead.

Evaluated with two ML models on real cloud operational data, CloudResilienceML significantly boosts model resilience and reduces retraining costs compared to incremental learning and data drift-based retraining. In high-drift scenarios (e.g., Wikipedia trace), it reduces overhead by 50% compared to concept drift retraining and by 91% compared to incremental retraining. In stable environments (e.g., Microsoft Azure trace), CloudResilienceML maintains high accuracy with retraining costs 96% lower than concept drift methods and 86% lower than incremental retraining.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Autonomous System, Change Point Detection, Cloud Operational Data, Data Drift, Machine Learning, Resource Management, Time series
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-231946 (URN)10.1109/UCC63386.2024.00019 (DOI)2-s2.0-105004740726 (Scopus ID)9798350367201 (ISBN)9798350367218 (ISBN)
Conference
The 17th IEEE/ACM International Conference on Utility and Cloud Computing (UCC 2024), University of Sharjah, Sharjah, United Arab Emirates, 16-19 December, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)eSSENCE - An eScience Collaboration
Available from: 2024-12-21 Created: 2024-12-21 Last updated: 2025-06-17Bibliographically approved
Banerjee, S., Bhuyan, D., Elmroth, E. & Bhuyan, M. H. (2024). Cost-efficient feature selection for horizontal federated learning. IEEE Transactions on Artificial Intelligence, 5(12), 6551-6565
Open this publication in new window or tab >>Cost-efficient feature selection for horizontal federated learning
2024 (English)In: IEEE Transactions on Artificial Intelligence, E-ISSN 2691-4581, Vol. 5, no 12, p. 6551-6565Article in journal (Refereed) Published
Abstract [en]

Horizontal Federated Learning exhibits substantial similarities in feature space across distinct clients. However, not all features contribute significantly to the training of the global model. Moreover, the curse of dimensionality delays the training. Therefore, reducing irrelevant and redundant features from the feature space makes training faster and inexpensive. This work aims to identify the common feature subset from the clients in federated settings. We introduce a hybrid approach called Fed-MOFS 1 , utilizing Mutual Information and Clustering for local feature selection at each client. Unlike the Fed-FiS, which uses a scoring function for global feature ranking, Fed-MOFS employs multi-objective optimization to prioritize features based on their higher relevance and lower redundancy. This paper compares the performance of Fed-MOFS 2 with conventional and federated feature selection methods. Moreover, we tested the scalability, stability, and efficacy of both Fed-FiS and Fed-MOFS across diverse datasets. We also assessed how feature selection influenced model convergence and explored its impact in scenarios with data heterogeneity. Our results show that Fed-MOFS enhances global model performance with a 50% reduction in feature space and is at least twice as fast as the FSHFL method. The computational complexity for both approaches is O( d 2 ), which is lower than the state-of-the-art.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Feature extraction, Computational modeling, Data models, Training, Federated learning, Artificial intelligence, Servers, Clustering, Horizontal Federated Learning, Feature Selection, Mutual Information, Multi-objective Optimization
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-228215 (URN)10.1109/TAI.2024.3436664 (DOI)2-s2.0-85200235298 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2024-08-05 Created: 2024-08-05 Last updated: 2025-01-13Bibliographically approved
Forough, J., Haddadi, H., Bhuyan, M. H. & Elmroth, E. (2024). Efficient anomaly detection for edge clouds: mitigating data and resource constraints.
Open this publication in new window or tab >>Efficient anomaly detection for edge clouds: mitigating data and resource constraints
2024 (English)Manuscript (preprint) (Other academic)
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-220244 (URN)
Funder
Umeå UniversityWallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2024-01-30 Created: 2024-01-30 Last updated: 2024-07-02
Forough, J., Haddadi, H., Bhuyan, M. & Elmroth, E. (2024). Efficient anomaly detection for edge clouds: mitigating data and resource constraints. IEEE Access, 12, 171897-171910
Open this publication in new window or tab >>Efficient anomaly detection for edge clouds: mitigating data and resource constraints
2024 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 12, p. 171897-171910Article in journal (Refereed) Published
Abstract [en]

Anomaly detection plays a vital role in ensuring the security and reliability of edge clouds, which are decentralized computing environments with limited resources. However, the unique challenges of limited computing power and lack of edge-related labeled training data pose significant obstacles to effective supervised anomaly detection. In this paper, we propose an innovative approach that leverages transfer learning to address the lack of relevant labeled data and knowledge distillation to increase computational efficiency and achieve accurate anomaly detection on edge clouds. Our approach exploits transfer learning by utilizing knowledge from a pre-trained model and adapting it for anomaly detection on edge clouds. This enables the model to benefit from the learned features and patterns from related tasks such as network intrusion detection, resulting in improved detection accuracy. Additionally, we utilize knowledge distillation to distill the knowledge from the previously mentioned high-capacity model, known as the teacher model, into a more compact student model. This distillation process enhances the student model's computational efficiency while retaining its detection power. Evaluations conducted on our developed real-world edge cloud testbed show that, with the same amount of edge cloud's labeled dataset, our approach maintains high accuracy while significantly reducing the model's detection time to almost half for non-sequential models, from 81.11μs to 44.34μs on average. For sequential models, it reduces the detection time to nearly a third of the baseline model's, from 331.54μs to 113.86μs on average. These improvements make our approach exceptionally practical for real-time anomaly detection on edge clouds.

Place, publisher, year, edition, pages
IEEE, 2024
Keywords
Anomaly detection, data constraints, edge clouds, knowledge distillation, resource constraints, transfer learning
National Category
Computer Sciences Computer Systems
Identifiers
urn:nbn:se:umu:diva-231909 (URN)10.1109/ACCESS.2024.3492815 (DOI)001362127900039 ()2-s2.0-85208701570 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2024-11-20 Created: 2024-11-20 Last updated: 2024-12-13Bibliographically approved
Nguyen, C., Bhuyan, M. & Elmroth, E. (2024). Enhancing machine learning performance in dynamic cloud environments with auto-adaptive models. In: The 15th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2024): Proceedings. Paper presented at The 15th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2024), Khalifa University, Abu Dabi, United Arab Emirates, 9-11 Dec, 2024 (pp. 184-191). IEEE
Open this publication in new window or tab >>Enhancing machine learning performance in dynamic cloud environments with auto-adaptive models
2024 (English)In: The 15th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2024): Proceedings, IEEE, 2024, p. 184-191Conference paper, Published paper (Refereed)
Abstract [en]

Autonomous resource management is essential for large-scale cloud data centers, where Machine Learning~(ML) enables intelligent decision-making. However, shifts in data patterns within operational streams pose significant challenges to sustaining model accuracy and system efficiency.

This paper proposes an auto-adaptive ML approach to mitigate the impact of data drift in cloud systems. A knowledge base of distinct time-series batches and corresponding ML models is constructed and clustered using the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) algorithm. When model performance degrades, the system uses Dynamic Time Warping (DTW) to retrieve matching hyperparameters from the knowledge base and apply them to the deployed model, optimizing inference accuracy on new data streams.

Experiments with two real-world cloud data traces -- representing both stable and highly fluctuating environments - demonstrate that the proposed approach maintains high model accuracy (over 89%) while minimizing retraining costs. Specifically, for the Wikipedia trace with frequent data drift, retraining overhead is reduced by 22.9% compared to drift detection-based retraining and by 97\% compared to incremental retraining. In stable environments, like the Google cluster trace, retraining costs decrease by 96.3% and 88.9%, respectively.

Place, publisher, year, edition, pages
IEEE, 2024
Series
Proceedings (IEEE International Conference on Cloud Computing Technology and Science. Online), ISSN 2380-8004, E-ISSN 2330-2186
Keywords
Autonomous Cloud, Self-Adaptation, Cloud Operational Data, Data Drift, Machine Learning, Retraining
National Category
Computer Systems Computer Sciences
Research subject
Computer Science; Computer Systems
Identifiers
urn:nbn:se:umu:diva-231945 (URN)10.1109/CloudCom62794.2024.00024 (DOI)2-s2.0-85217015975 (Scopus ID)979-8-3315-0758-9 (ISBN)
Conference
The 15th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2024), Khalifa University, Abu Dabi, United Arab Emirates, 9-11 Dec, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2024-12-21 Created: 2024-12-21 Last updated: 2025-03-26Bibliographically approved
Rasouli, N., Klein, C. & Elmroth, E. (2024). Fault tolerance infrastructure for mission-critical mobile edge cloud applications. In: 2024 IEEE/ACM 17th International Conference on Utility and Cloud Computing (UCC): . Paper presented at UCC 2024, 17th IEEE/ACM International Conference on Utility and Cloud Computing, Sharjah, United Arab Emirates, December 16-19, 2024 (pp. 382-388). IEEE
Open this publication in new window or tab >>Fault tolerance infrastructure for mission-critical mobile edge cloud applications
2024 (English)In: 2024 IEEE/ACM 17th International Conference on Utility and Cloud Computing (UCC), IEEE, 2024, p. 382-388Conference paper, Published paper (Refereed)
Abstract [en]

Disaster management, such as early warnings for earthquakes, hurricanes, and fires, requires IoT sensors and cameras, which produce tremendous amounts of data.To avoid network bandwidth congestion, much of this data needs to be processed close to where it is produced, as enabled by Mobile Edge Clouds (MEC). However, for such use cases, the disaster itself may take out the MEC, hence hindering disaster management efforts. We present a fault tolerance infrastructure tailored specifically for MEC systems to address various types of failures as part of a holistic disaster recovery solution. Our research investigates using current technologies, such as Kubernetes, to effectively handle fault tolerance in situations involving the failure of one or several edge nodes and RabbitMQ as a resilient message broker in our proposed infrastructure to ensure dependable message transmission, even during network outages. To evaluate our framework, we conduct a case study using weather stations as mission-critical assets within an urban setting next to forests where edge nodes are placed as safely as possible. The experiments demonstrate that the infrastructure can handle two node failures simultaneously. The proposed infrastructure ensures 99.966\% availability for both the system and mission-critical applications.

Place, publisher, year, edition, pages
IEEE, 2024
Keywords
Fault-tolerance, Mission-critical applications, Kubernetes, RabbitMQ, Disaster recovery, Edge
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-236638 (URN)10.1109/UCC63386.2024.00059 (DOI)2-s2.0-105004734202 (Scopus ID)979-8-3503-6720-1 (ISBN)979-8-3503-6721-8 (ISBN)
Conference
UCC 2024, 17th IEEE/ACM International Conference on Utility and Cloud Computing, Sharjah, United Arab Emirates, December 16-19, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2025-03-19 Created: 2025-03-19 Last updated: 2025-06-04Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-2633-6798

Search in DiVA

Show all publications