Umeå University's logo

umu.sePublications
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
Link to record
Permanent link

Direct link
Publications (3 of 3) Show all publications
Rasouli, N., Elmroth, E. & Klein, C. (2025). FaLSE: a failure and latency-aware scheduling for mission-critical applications at the edge. In: : . Paper presented at The 16th IEEE International Conference on Cloud Computing Technology and Science, CloudCom2025, Shenzhen, China, November 14-16, 2025.
Open this publication in new window or tab >>FaLSE: a failure and latency-aware scheduling for mission-critical applications at the edge
2025 (English)Conference paper, Oral presentation only (Refereed)
Abstract [en]

Mission-critical applications, such as real-time emergency response, healthcare, and transport systems, depend heavily on the low latency and reliability provided by Mobile Edge Computing (MEC). The failure of such applications can lead to high latency and severe consequences, including loss of life, financial catastrophe, or operational disruption. However, the dependability of edge clusters is often overlooked, particularly in terms of fault awareness and recovery strategies, which are crucial to these applications. In this work, we focus on loosely coupled IoT applications and propose a Failure and Latency-aware Scheduling approach for Edge (FaLSE) that balances the trade-off between the availability of edge clusters and the latency of containerized mission-critical applications. We used a decentralized network coordinate system to estimate latency between IoT devices/users and nodes. To validate the proposed approach, we compare it with the standard Kubernetes scheduler, which is currently among the most widely used workload orchestration platforms. The results indicate that FaLSE reduced the failure request rate by 87.9% while maintaining a 71.97% lower 95th percentile latency for mission-critical applications and a 10.63% lower latency for normal applications compared to the standard Kubernetes scheduler.

Keywords
Edge computing, Fault-tolerance, Scheduling, Kubernetes, Mission-critical applications
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-246847 (URN)
Conference
The 16th IEEE International Conference on Cloud Computing Technology and Science, CloudCom2025, Shenzhen, China, November 14-16, 2025
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2025-11-25 Created: 2025-11-25 Last updated: 2025-11-26Bibliographically approved
Rasouli, N., Klein, C. & Elmroth, E. (2025). Resource management for mission-critical applications in edge computing: systematic review on recent research and open issues. ACM Computing Surveys, 58(3), Article ID 71.
Open this publication in new window or tab >>Resource management for mission-critical applications in edge computing: systematic review on recent research and open issues
2025 (English)In: ACM Computing Surveys, ISSN 0360-0300, E-ISSN 1557-7341, Vol. 58, no 3, article id 71Article in journal (Refereed) Published
Abstract [en]

In the realm of edge computing, the optimization of latency, energy, bandwidth, and local computation is critical, especially for mission-critical applications in sectors like disaster management and healthcare. Such applications, exemplified by deploying UAVs and autonomous robots, demand instantaneous data processing. Given the inherent constraints of edge servers—characterized by their limited capacity—meticulous resource management becomes paramount. This entails judicious resource allocation, astute provisioning, strategic task offloading, and judicious application placement, all pivotal for both fixed and mobile resource service delivery. This survey delves deep into the nuances of deploying mission-critical applications in an edge environment, dissecting their technological prerequisites. Our exploration employs a systematic literature review grounded in a conventional review methodology. We analyze the cornerstone quality of service metrics pivotal for such critical applications in edge contexts, aiming for efficient service delivery. Moreover, we identified some major gaps in current resource management strategies. Our overarching ambition is to pave the way for robust edge computing paradigms tailored for mission-critical applications.

Place, publisher, year, edition, pages
ACM Digital Library, 2025
Keywords
Mobile edge computing, resource management, resource allocation, scheduling, Internet of Things (IoT), mission-critical applications, latency-aware applications
National Category
Computer Systems
Identifiers
urn:nbn:se:umu:diva-244028 (URN)10.1145/3762181 (DOI)2-s2.0-105022022896 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2025-09-09 Created: 2025-09-09 Last updated: 2025-12-15Bibliographically approved
Rasouli, N., Klein, C. & Elmroth, E. (2024). Fault tolerance infrastructure for mission-critical mobile edge cloud applications. In: 2024 IEEE/ACM 17th International Conference on Utility and Cloud Computing (UCC): . Paper presented at UCC 2024, 17th IEEE/ACM International Conference on Utility and Cloud Computing, Sharjah, United Arab Emirates, December 16-19, 2024 (pp. 382-388). IEEE
Open this publication in new window or tab >>Fault tolerance infrastructure for mission-critical mobile edge cloud applications
2024 (English)In: 2024 IEEE/ACM 17th International Conference on Utility and Cloud Computing (UCC), IEEE, 2024, p. 382-388Conference paper, Published paper (Refereed)
Abstract [en]

Disaster management, such as early warnings for earthquakes, hurricanes, and fires, requires IoT sensors and cameras, which produce tremendous amounts of data.To avoid network bandwidth congestion, much of this data needs to be processed close to where it is produced, as enabled by Mobile Edge Clouds (MEC). However, for such use cases, the disaster itself may take out the MEC, hence hindering disaster management efforts. We present a fault tolerance infrastructure tailored specifically for MEC systems to address various types of failures as part of a holistic disaster recovery solution. Our research investigates using current technologies, such as Kubernetes, to effectively handle fault tolerance in situations involving the failure of one or several edge nodes and RabbitMQ as a resilient message broker in our proposed infrastructure to ensure dependable message transmission, even during network outages. To evaluate our framework, we conduct a case study using weather stations as mission-critical assets within an urban setting next to forests where edge nodes are placed as safely as possible. The experiments demonstrate that the infrastructure can handle two node failures simultaneously. The proposed infrastructure ensures 99.966\% availability for both the system and mission-critical applications.

Place, publisher, year, edition, pages
IEEE, 2024
Keywords
Fault-tolerance, Mission-critical applications, Kubernetes, RabbitMQ, Disaster recovery, Edge
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-236638 (URN)10.1109/UCC63386.2024.00059 (DOI)2-s2.0-105004734202 (Scopus ID)979-8-3503-6720-1 (ISBN)979-8-3503-6721-8 (ISBN)
Conference
UCC 2024, 17th IEEE/ACM International Conference on Utility and Cloud Computing, Sharjah, United Arab Emirates, December 16-19, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2025-03-19 Created: 2025-03-19 Last updated: 2025-06-04Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-8585-3584

Search in DiVA

Show all publications