Umeå University's logo

umu.sePublications
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 43) Show all publications
Rasouli, N., Klein, C. & Elmroth, E. (2024). Fault tolerance infrastructure for mission-critical mobile edge cloud applications. In: 2024 IEEE/ACM 17th International Conference on Utility and Cloud Computing (UCC): . Paper presented at UCC 2024, 17th IEEE/ACM International Conference on Utility and Cloud Computing, Sharjah, United Arab Emirates, December 16-19, 2024 (pp. 382-388). IEEE
Open this publication in new window or tab >>Fault tolerance infrastructure for mission-critical mobile edge cloud applications
2024 (English)In: 2024 IEEE/ACM 17th International Conference on Utility and Cloud Computing (UCC), IEEE, 2024, p. 382-388Conference paper, Published paper (Refereed)
Abstract [en]

Disaster management, such as early warnings for earthquakes, hurricanes, and fires, requires IoT sensors and cameras, which produce tremendous amounts of data.To avoid network bandwidth congestion, much of this data needs to be processed close to where it is produced, as enabled by Mobile Edge Clouds (MEC). However, for such use cases, the disaster itself may take out the MEC, hence hindering disaster management efforts. We present a fault tolerance infrastructure tailored specifically for MEC systems to address various types of failures as part of a holistic disaster recovery solution. Our research investigates using current technologies, such as Kubernetes, to effectively handle fault tolerance in situations involving the failure of one or several edge nodes and RabbitMQ as a resilient message broker in our proposed infrastructure to ensure dependable message transmission, even during network outages. To evaluate our framework, we conduct a case study using weather stations as mission-critical assets within an urban setting next to forests where edge nodes are placed as safely as possible. The experiments demonstrate that the infrastructure can handle two node failures simultaneously. The proposed infrastructure ensures 99.966\% availability for both the system and mission-critical applications.

Place, publisher, year, edition, pages
IEEE, 2024
Keywords
Fault-tolerance, Mission-critical applications, Kubernetes, RabbitMQ, Disaster recovery, Edge
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-236638 (URN)10.1109/UCC63386.2024.00059 (DOI)2-s2.0-105004734202 (Scopus ID)979-8-3503-6720-1 (ISBN)979-8-3503-6721-8 (ISBN)
Conference
UCC 2024, 17th IEEE/ACM International Conference on Utility and Cloud Computing, Sharjah, United Arab Emirates, December 16-19, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2025-03-19 Created: 2025-03-19 Last updated: 2025-06-04Bibliographically approved
Nguyen, C. L., Klein, C. & Elmroth, E. (2024). State-aware application placement in mobile edge clouds. In: Maarten van Steen; Claus Pahl (Ed.), Proceedings of the 14th international conference on cloud computing and services science: . Paper presented at 14th International Conference on Cloud Computing and Services Science, CLOSER 2024, Angers, France, May 2-4, 2024 (pp. 117-128). Portugal: Science and Technology Publications, 1
Open this publication in new window or tab >>State-aware application placement in mobile edge clouds
2024 (English)In: Proceedings of the 14th international conference on cloud computing and services science / [ed] Maarten van Steen; Claus Pahl, Portugal: Science and Technology Publications , 2024, Vol. 1, p. 117-128Conference paper, Published paper (Refereed)
Abstract [en]

Placing applications within Mobile Edge Clouds (MEC) poses challenges due to dynamic user mobility. Maintaining optimal Quality of Service may require frequent application migration in response to changing user locations, potentially leading to bandwidth wastage. This paper addresses application placement challenges in MEC environments by developing a comprehensive model covering workloads, applications, and MEC infrastructures. Following this, various costs associated with application operation, including resource utilization, migration overhead, and potential service quality degradation, are systematically formulated. An online application placement algorithm, App EDC Match, inspired by the Gale-Shapley matching algorithm, is introduced to optimize application placement considering these cost factors. Through experiments that employ real mobility traces to simulate workload dynamics, the results demonstrate that the proposed algorithm efficiently determines near-optimal application placements within Edge Data Centers. It achieves total operating costs within a narrow margin of 8% higher than the approximate global optimum attained by the offline precognition algorithm, which assumes access to future user locations. Additionally, the proposed placement algorithm effectively mitigates resource scarcity in MEC.

Place, publisher, year, edition, pages
Portugal: Science and Technology Publications, 2024
Series
International Conference on Cloud Computing and Services Science, E-ISSN 2184-5042
Keywords
Mobile Edge Clouds, Application Placement, Service Orchestration, Optimization
National Category
Computer Sciences Computer Systems
Research subject
Computer Science; Computer Systems
Identifiers
urn:nbn:se:umu:diva-178830 (URN)10.5220/0012326300003711 (DOI)2-s2.0-85194153990 (Scopus ID)9789897587016 (ISBN)
Conference
14th International Conference on Cloud Computing and Services Science, CLOSER 2024, Angers, France, May 2-4, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

Originally included in thesis in manuscript form.

Available from: 2021-01-19 Created: 2021-01-19 Last updated: 2024-07-17Bibliographically approved
Saleh Sedghpour, M. R., Garlan, D., Schmerl, B., Klein, C. & Tordsson, J. (2023). Breaking the vicious circle: self-adaptive microservice circuit breaking and retry. In: Lisa O’Conner (Ed.), 2023 IEEE international conference on cloud engineering: proceedings. Paper presented at 2023 IEEE International Conference on Cloud Engineering (IC2E), Boston, Massachusetts, 25–28 September 2023. (pp. 32-42). IEEE Computer Society, Article ID 24126172.
Open this publication in new window or tab >>Breaking the vicious circle: self-adaptive microservice circuit breaking and retry
Show others...
2023 (English)In: 2023 IEEE international conference on cloud engineering: proceedings / [ed] Lisa O’Conner, IEEE Computer Society, 2023, p. 32-42, article id 24126172Conference paper, Published paper (Refereed)
Abstract [en]

Microservice-based architectures consist of numerous, loosely coupled services with multiple instances. Service meshes aim to simplify traffic management and prevent microservice overload through circuit breaking and request retry mechanisms. Previous studies have demonstrated that the static configuration of these mechanisms is unfit for the dynamic environment of microservices. We conduct a sensitivity analysis to understand the impact of retrying across a wide range of scenarios. Based on the findings, we propose a retry controller that can also work with dynamically configured circuit breakers. We have empirically assessed our proposed controller in various scenarios, including transient overload and noisy neighbors while enforcing adaptive circuit breaking. The results show that our proposed controller does not deviate from a well-tuned configuration while maintaining carried response time and adapting to the changes. In comparison to the default static retry configuration that is mostly used in practice, our approach improves the carried throughput up to 12x and 32x respectively in the cases of transient overload and noisy neighbors.

Place, publisher, year, edition, pages
IEEE Computer Society, 2023
Series
Proceedings of the ... IEEE International Symposium on Requirements Engineering, E-ISSN 2332-6441
Keywords
reliability, retry mechanism, circuit breaker pattern, service mesh, microservices
National Category
Computer Engineering Software Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-206989 (URN)10.1109/IC2E59103.2023.00012 (DOI)001103216300004 ()2-s2.0-85179510941 (Scopus ID)979-8-3503-4394-6 (ISBN)
Conference
2023 IEEE International Conference on Cloud Engineering (IC2E), Boston, Massachusetts, 25–28 September 2023.
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

Originally included in thesis in manuscript form. 

Available from: 2023-04-24 Created: 2023-04-24 Last updated: 2025-04-24Bibliographically approved
Saleh Sedghpour, M. R., Obeso Duque, A., Cai, X., Skubic, B., Elmroth, E., Klein, C. & Tordsson, J. (2023). Hydragen: a microservice benchmark generator. In: C. Ardagna; N. Atukorala; P. Beckman; C.K. Chang; R.N. Chang; C. Evangelinos; J. Fan; G.C. Fox; J. Fox; C. Hagleitner; Z. Jin; T. Kosar; M. Parashar (Ed.), 2023 IEEE 16th international conference on cloud computing (CLOUD): . Paper presented at 16th IEEE International Conference on Cloud Computing, CLOUD 2023, Hybrid/Chicago, July 2-8, 2023 (pp. 189-200). IEEE, 2023-July
Open this publication in new window or tab >>Hydragen: a microservice benchmark generator
Show others...
2023 (English)In: 2023 IEEE 16th international conference on cloud computing (CLOUD) / [ed] C. Ardagna; N. Atukorala; P. Beckman; C.K. Chang; R.N. Chang; C. Evangelinos; J. Fan; G.C. Fox; J. Fox; C. Hagleitner; Z. Jin; T. Kosar; M. Parashar, IEEE, 2023, Vol. 2023-July, p. 189-200Conference paper, Published paper (Refereed)
Abstract [en]

Microservice-based architectures have become ubiq-uitous in large-scale software systems. Experimental cloud re-searchers constantly propose enhanced resource management mechanisms for such systems. These mechanisms need to be eval-uated using both realistic and flexible microservice benchmarks to study in which ways diverse application characteristics can affect their performance and scalability. However, current mi-croservice benchmarks have limitations including static compu-tational complexity, limited architectural scale, and fixed topology (i.e., number of tiers, fan-in, and fan-out characteristics).

We therefore propose HydraGen, a tool that enables re-searchers to systematically generate benchmarks with different computational complexities and topologies, to tackle experimental evaluation of performance at scale for web-serving applications, with a focus on inter-service communication. To illustrate the potential of our open-source tool, we demonstrate how it can reproduce an existing microservice benchmark with preserved architectural properties. We also demonstrate how HydraGen can enrich the evaluation of cloud management systems based on a case study related to traffic engineering.

Place, publisher, year, edition, pages
IEEE, 2023
Series
IEEE International Conference on Cloud Computing, CLOUD, ISSN 2159-6182, E-ISSN 2159-6190
Keywords
microservices, benchmark generator, performance analysis, emulation, validation, cloud systems
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-206987 (URN)10.1109/CLOUD60044.2023.00030 (DOI)001085065100020 ()2-s2.0-85174317366 (Scopus ID)9798350304817 (ISBN)9798350304824 (ISBN)
Conference
16th IEEE International Conference on Cloud Computing, CLOUD 2023, Hybrid/Chicago, July 2-8, 2023
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Google
Note

Originally included in thesis in manuscript form. 

Available from: 2023-04-24 Created: 2023-04-24 Last updated: 2025-04-24Bibliographically approved
Larsson, O., Klein, C. & Elmroth, E. (2023). The impact of directed pod eviction on Kubernetes resource utilization. In: 2023 IEEE international conference on service-oriented system engineering (SOSE): proceedings. Paper presented at 2023 IEEE International Conference on Service-Oriented System Engineering (SOSE), Athes, Greece, July 17-20, 2023 (pp. 81-90). IEEE
Open this publication in new window or tab >>The impact of directed pod eviction on Kubernetes resource utilization
2023 (English)In: 2023 IEEE international conference on service-oriented system engineering (SOSE): proceedings, IEEE, 2023, p. 81-90Conference paper, Published paper (Refereed)
Abstract [en]

One of the promises of container orchestration technologies is that they enable effective utilization of hardware resources and thus reduce infrastructure costs. However, as applications scale up and down over time in a Kubernetes-managed environment, the cluster may enter a state of resource fragmentation. In such a state, the cluster's hardware cannot be used to its full potential because resources become locked by poor Pod to Node mappings. This paper shows that such problems are common as workload requirements approach a cluster's resource capacity. Additionally, we present an experimental analysis of directed Pod eviction as a technique to combat the issue of resource fragmentation. Our findings show that directed Pod eviction can reduce resource fragmentation and allow clusters to run a given workload using 16.7% fewer Nodes than is consistently possible using standard Kubernetes scheduling. These findings are unaffected by applying input shaking to the experimental analysis, strengthening confidence in the generality of these resource utilization improvements.

Place, publisher, year, edition, pages
IEEE, 2023
Series
IEEE International Symposium on Service-Oriented System Engineering, ISSN 2640-8228, E-ISSN 2642-6587
Keywords
Cloud computing, Kubernetes, scheduling, directed Pod eviction, resource fragmentation, resource utilization, experimental evaluation, Descheduler for Kubernetes
National Category
Computer Systems Computer Sciences
Research subject
Computer Science; Computer Systems
Identifiers
urn:nbn:se:umu:diva-214710 (URN)10.1109/SOSE58276.2023.00016 (DOI)001084635000010 ()2-s2.0-85174951364 (Scopus ID)979-8-3503-2239-2 (ISBN)979-8-3503-2240-8 (ISBN)
Conference
2023 IEEE International Conference on Service-Oriented System Engineering (SOSE), Athes, Greece, July 17-20, 2023
Funder
Knut and Alice Wallenberg Foundation, 2019.0352
Available from: 2023-09-26 Created: 2023-09-26 Last updated: 2025-05-22Bibliographically approved
Obeso Duque, A., Klein, C., Feng, J., Cai, X., Skubic, B. & Elmroth, E. (2022). A Qualitative Evaluation of Service Mesh-based Traffic Management for Mobile Edge Cloud. In: 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022: Proceedings. Paper presented at 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022, Taormina, Italy, 16-19 May 2022. (pp. 210-219). IEEE
Open this publication in new window or tab >>A Qualitative Evaluation of Service Mesh-based Traffic Management for Mobile Edge Cloud
Show others...
2022 (English)In: 22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022: Proceedings, IEEE, 2022, p. 210-219Conference paper, Published paper (Refereed)
Abstract [en]

Service mesh is getting widely adopted as the cloud-native mechanism for traffic management in microservice-based applications, in particular for generic IT workloads hosted in more centralized cloud environments. Performance-demanding applications continue to drive the decentralization of modern application execution environments, as in the case of mobile edge cloud. This paper presents a systematic and qualitative analysis of state-of-the-art service mesh to evaluate how suitable its design is for addressing the traffic management needs of performance-demanding application workloads hosted in a mobile edge cloud environment. With this analysis, we argue that today's dependability-centric service mesh design fails at addressing the needs of the different types of emerging mobile edge cloud workloads and motivate further research in the directions of performance-efficient architectures, stronger QoS guarantees and higher complexity abstractions of cloud-native traffic manage-ment frameworks.

Place, publisher, year, edition, pages
IEEE, 2022
Keywords
efficient traffic management, mobile edge cloud, multi-access edge computing, performance-demanding applications, service mesh, software engineering
National Category
Computer Sciences Computer Systems
Identifiers
urn:nbn:se:umu:diva-198733 (URN)10.1109/CCGrid54584.2022.00030 (DOI)000855065800022 ()2-s2.0-85135759502 (Scopus ID)9781665499576 (ISBN)9781665499569 (ISBN)
Conference
22nd IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid 2022, Taormina, Italy, 16-19 May 2022.
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Knut and Alice Wallenberg Foundation
Available from: 2022-08-22 Created: 2022-08-22 Last updated: 2023-09-05Bibliographically approved
Saleh Sedghpour, M. R., Klein, C. & Tordsson, J. (2022). An Empirical Study of Service Mesh Traffic Management Policies for Microservices. In: ICPE '22: Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering: . Paper presented at CPE '22: ACM/SPEC International Conference on Performance Engineering, Bejing, China, April 9 - 13, 2022 (pp. 17-27). New York: ACM Digital Library
Open this publication in new window or tab >>An Empirical Study of Service Mesh Traffic Management Policies for Microservices
2022 (English)In: ICPE '22: Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering, New York: ACM Digital Library, 2022, p. 17-27Conference paper, Published paper (Refereed)
Abstract [en]

A microservice architecture features hundreds or even thousands of small loosely coupled services with multiple instances. Because microservice performance depends on many factors including the workload, inter-service traffic management is complex in such dynamic environments. Service meshes aim to handle this complexity and to facilitate management, observability, and communication between microservices. Service meshes provide various traffic management policies such as circuit breaking and retry mechanisms, which are claimed to protect microservices against overload and increase the robustness of communication between microservices. However, there have been no systematic studies on the effects of these mechanisms on microservice performance and robustness. Furthermore, the exact impact of various tuning parameters for circuit breaking and retries are poorly understood. This work presents a large set of experiments conducted to investigate these issues using a representative microservice benchmark in a Kubernetes testbed with the widely used Istio service mesh. Our experiments reveal effective configurations of circuit breakers and retries. The findings presented will be useful to engineers seeking to configure service meshes more systematically and also open up new areas of research for academics in the area of service meshes for (autonomic) microservice resource management.

Place, publisher, year, edition, pages
New York: ACM Digital Library, 2022
Keywords
microservices, service mesh, traffic management, circuit breaking, retry, microservice resiliency
National Category
Computer Systems
Research subject
Computer Systems
Identifiers
urn:nbn:se:umu:diva-193341 (URN)10.1145/3489525.3511686 (DOI)000883411400004 ()2-s2.0-85128651087 (Scopus ID)
Conference
CPE '22: ACM/SPEC International Conference on Performance Engineering, Bejing, China, April 9 - 13, 2022
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2022-03-28 Created: 2022-03-28 Last updated: 2023-09-05Bibliographically approved
Bermbach, D., Krintz, C., Guo, T., Thamsen, L., Venugopal, S., Sharma, P., . . . Trihinas, D. (2022). Message from the Technical Program Chairs: IC2E 2022. Paper presented at 10th IEEE International Conference on Cloud Engineering, IC2E 2022, September 26-30, 2022. Proceedings of the IEEE International Conference on Cloud Engineering, x-x
Open this publication in new window or tab >>Message from the Technical Program Chairs: IC2E 2022
Show others...
2022 (English)In: Proceedings of the IEEE International Conference on Cloud Engineering, ISSN 2373-3845, p. x-xArticle in journal, Editorial material (Refereed) Published
Abstract [en]

Welcome to the 10th International Conference on Cloud Engineering (IC2E-2021), sponsored by IEEE and held inperson in beautiful Pacific Grove, CA (near Monterrey, CA) - returning to the Bay Area of California for this 10th anniversary - the location where IC2E started over a decade ago! We are thrilled to be returning to a safe, yet inperson, event this year after being virtual last year due to the COVID-19 pandemic. We look forward to catching up with you, hearing about your latest cloud computing research, socializing in a beautiful setting, and meeting those of you new to IC2E.

Place, publisher, year, edition, pages
IEEE, 2022
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-201637 (URN)10.1109/IC2E55432.2022.00005 (DOI)2-s2.0-85143164090 (Scopus ID)9781665491150 (ISBN)
Conference
10th IEEE International Conference on Cloud Engineering, IC2E 2022, September 26-30, 2022
Available from: 2022-12-13 Created: 2022-12-13 Last updated: 2022-12-13Bibliographically approved
Larsson, L., Tärneberg, W., Klein, C., Kihl, M. & Elmroth, E. (2021). Adaptive and Application-agnostic Caching in Service Meshes for Resilient Cloud Applications. In: Proceedings of the 2021 IEEE Conference on Network Softwarization: Accelerating Network Softwarization in the Cognitive Age, NetSoft 2021. Paper presented at The 7th IEEE International Conference on Network Softwarization (NetSoft 2021), June 28-July 2, 2021 (pp. 176-180). IEEE, Article ID 9492576.
Open this publication in new window or tab >>Adaptive and Application-agnostic Caching in Service Meshes for Resilient Cloud Applications
Show others...
2021 (English)In: Proceedings of the 2021 IEEE Conference on Network Softwarization: Accelerating Network Softwarization in the Cognitive Age, NetSoft 2021, IEEE, 2021, p. 176-180, article id 9492576Conference paper, Published paper (Refereed)
Abstract [en]

Service meshes factor out code dealing with inter-micro-service communication. The overall resilience of a cloud application is improved if constituent micro-services return stale data, instead of no data at all. This paper proposes and implements application agnostic caching for micro services. While caching is widely employed for serving web service traffic, its usage in inter-micro-service communication is lacking. Micro-services responses are highly dynamic, which requires carefully choosing adaptive time-to-life caching algorithms. Our approach is application agnostic, is cloud native, and supports gRPC. We evaluate our approach and implementation using the micro-service benchmark by Google Cloud called Hipster Shop. Our approach results in caching of about 80% of requests. Results show the feasibility and efficiency of our approach, which encourages implementing caching in service meshes. Additionally, we make the code, experiments, and data publicly available.

Place, publisher, year, edition, pages
IEEE, 2021
Keywords
Containerized network functions, Microservices, Service-mesh
National Category
Computer Systems
Identifiers
urn:nbn:se:umu:diva-183327 (URN)10.1109/NetSoft51509.2021.9492576 (DOI)000718599000025 ()2-s2.0-85112087682 (Scopus ID)978-1-6654-0522-5 (ISBN)
Conference
The 7th IEEE International Conference on Network Softwarization (NetSoft 2021), June 28-July 2, 2021
Available from: 2021-05-23 Created: 2021-05-23 Last updated: 2023-09-05Bibliographically approved
Saleh Sedghpour, M. R., Klein, C. & Tordsson, J. (2021). Service mesh circuit breaker: From panic button to performance management tool. In: HAOC '21: Proceedings of the 1st Workshop on High Availability and Observability of Cloud Systems: . Paper presented at EuroSys '21: Sixteenth European Conference on Computer Systems, Online, UK, April, 2021 (pp. 4-10). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Service mesh circuit breaker: From panic button to performance management tool
2021 (English)In: HAOC '21: Proceedings of the 1st Workshop on High Availability and Observability of Cloud Systems, Association for Computing Machinery (ACM), 2021, p. 4-10Conference paper, Published paper (Refereed)
Abstract [en]

Site Reliability Engineers are at the center of two tensions: On one hand, they need to respond to alerts within a short time, to restore a non-functional system. On the other hand, short response times is disruptive to everyday life and lead to alert fatigue. To alleviate this tension, many resource management mechanisms are proposed handle overload and mitigate the faults. One recent such mechanism is circuit breaking in service meshes. Circuit breaking rejects incoming requests to protect latency at the expense of availability (successfully answered requests), but in many scenarios achieve neither due to the difficulty of knowing when to trigger circuit breaking in highly dynamic microservice environments.

We propose an adaptive circuit breaking mechanism, implemented through an adaptive controller, that not only avoids overload and mitigate failure, but keeps the tail response time below a given threshold while maximizing service throughput. Our proposed controller is experimentally compared with a static circuit breaker across a wide set of overload scenarios in a testbed based on Istio and Kubernetes. The results show that our controller maintains tail response time below the given threshold 98% of the time (including cold starts) on average with an availability of 70% with 29% of requests circuit broken. This compares favorably to a static circuit breaker configuration, which features a 63% availability, 30% circuit broken requests, and more than 5% of requests timing out.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2021
Keywords
micro-services, circuit breaker, performance management, control theory
National Category
Computer Systems
Identifiers
urn:nbn:se:umu:diva-182614 (URN)10.1145/3447851.3458740 (DOI)2-s2.0-85106002930 (Scopus ID)978-1-4503-8336-3 (ISBN)
Conference
EuroSys '21: Sixteenth European Conference on Computer Systems, Online, UK, April, 2021
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2021-04-27 Created: 2021-04-27 Last updated: 2023-04-24Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-0106-3049

Search in DiVA

Show all publications