Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Breaking the vicious circle: self-adaptive microservice circuit breaking and retry
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Autonomous Distributed Systems Lab)ORCID iD: 0000-0002-0751-9695
Carnegie Mellon University, Pittsburgh, USA.ORCID iD: 0000-0002-6735-8301
Carnegie Mellon University, Pittsburgh, USA.ORCID iD: 0000-0001-7828-622X
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Autonomous Distributed Systems Lab)ORCID iD: 0000-0003-0106-3049
Show others and affiliations
2023 (English)In: 2023 IEEE international conference on cloud engineering: proceedings / [ed] Lisa O’Conner, IEEE Computer Society, 2023, p. 32-42, article id 24126172Conference paper, Published paper (Refereed)
Abstract [en]

Microservice-based architectures consist of numerous, loosely coupled services with multiple instances. Service meshes aim to simplify traffic management and prevent microservice overload through circuit breaking and request retry mechanisms. Previous studies have demonstrated that the static configuration of these mechanisms is unfit for the dynamic environment of microservices. We conduct a sensitivity analysis to understand the impact of retrying across a wide range of scenarios. Based on the findings, we propose a retry controller that can also work with dynamically configured circuit breakers. We have empirically assessed our proposed controller in various scenarios, including transient overload and noisy neighbors while enforcing adaptive circuit breaking. The results show that our proposed controller does not deviate from a well-tuned configuration while maintaining carried response time and adapting to the changes. In comparison to the default static retry configuration that is mostly used in practice, our approach improves the carried throughput up to 12x and 32x respectively in the cases of transient overload and noisy neighbors.

Place, publisher, year, edition, pages
IEEE Computer Society, 2023. p. 32-42, article id 24126172
Series
Proceedings of the ... IEEE International Symposium on Requirements Engineering, E-ISSN 2332-6441
Keywords [en]
reliability, retry mechanism, circuit breaker pattern, service mesh, microservices
National Category
Computer Engineering Software Engineering
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:umu:diva-206989DOI: 10.1109/IC2E59103.2023.00012Scopus ID: 2-s2.0-85179510941ISBN: 979-8-3503-4394-6 (electronic)OAI: oai:DiVA.org:umu-206989DiVA, id: diva2:1752664
Conference
2023 IEEE International Conference on Cloud Engineering (IC2E), Boston, Massachusetts, 25–28 September 2023.
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

Originally included in thesis in manuscript form. 

Available from: 2023-04-24 Created: 2023-04-24 Last updated: 2023-12-22Bibliographically approved
In thesis
1. Towards self-driving microservices
Open this publication in new window or tab >>Towards self-driving microservices
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Mot självkörande mikrotjänster
Abstract [en]

In recent years, microservice architecture has become a popular method for software system design and development. This involves creating applications with multiple small services, each with multiple instances, operating as independent processes. Due to the distributed nature of microservices, communication between services presents a challenging task that becomes increasingly complex as the number of services grows. This complexity can even lead to short-term failures that can degrade application performance. Therefore, the auto-tuning of inter-service communication is necessary to prevent such failures. Service meshes were introduced to offer the necessary technical capabilities that can be employed in such scenarios. In essence, a service mesh is an infrastructure layer that includes a set of configurable proxies integrated into microservices. This enables the provision of traffic management policies such as circuit breaking and retry mechanisms to enhance microservice resilience against transient failures. However, static configuration or misconfiguration of these mechanisms is unsuitable for the dynamic environment of microservices and can lead to serious issues and performance problems, such as retry storms.

The goal of this thesis is three-fold. First, it aims to investigate the impact and effectiveness of service traffic management on application reliability and availability in the presence of transient failures. Second, it focuses on auto-tuning of service traffic management to increase carried throughput and maintain carried response time. Third, this research aims to propose measures that can improve research reproducibility in the area of distributed systems ensuring that the findings can be independently verified by others. In this thesis, we aim to offer detailed guidelines on best practices for implementing research software.

To achieve these goals, this thesis delves into the current state-of-the-art in service meshes and eBPF-powered microservices, identifying current challenges and potential future directions. It analyzes the effects of circuit breaker and retry mechanisms on microservice performance and proposes adaptive controllers for both. The results show the need for such controllers that increase throughput while maintaining the tail response time of the application. Additionally, it proposes a microservice benchmark generator to enable systematic microservice benchmark generation and improve reproducibility. It also provides recommendations for improving artifact evaluation in distributed systems research by compiling all existing recommendations.

Abstract [sv]

Mikrotjänster har de senaste åren blivit en populär arkitekturmodell för programvara. Modellen innebär att man skapar applikationer med flera små tjänster, var och en med flera instanser som fungerar som oberoende processer. Mikrotjänsters distribuerade natur gör kommunikationen mellan tjänster mer utmanande. Denna komplexitet kan även ge upphov till tillfälliga fel på grund av lastobalans eller överbelastning och även försämra applikationers prestanda. Av denna anledning är dynamisk konfigurationav kommunikationen mellan mikrotjänster nödvändig. En service mesh är en teknikplattform för att hantera hur mikrotjänster kommunicerar, med funktioner för att enkelt kryptera kommunikation mellan tjänster, mäta prestanda och finkorningt styra kommunikationsflöden.En service mesh implementeras ofta som en uppsättning konfigurerbara proxies. Detta möjliggör trafikhanteringspolicies baserade på mekanismer som kretsbrytning och omsändningar. Statisk konfiguration av dessa mekanismer kan dock ge allvarliga prestandaproblem såsom  låg genomströmning och/eller omfattande omsändningar.

Denna avhandling har tre mål. För det första undersöker den hur trafikhantering för mikrotjänster påverkar tillförlitligheten och tillgängligheten för applikationer vid tillfälliga störningar. För det andra fokuserar den på adaptiv reglering av trafikhantering av mikrotjänster för att öka genomströmningen och samtidigt bibehålla acceptabla svarstider. För det tredje syftar den till att förbättra reproducerbarheten i forskning inom distribuerade system och se till att forskningsresultat enklare kan verifieras av oberoende. 

För att uppnå dessa mål undersöker avhandlingen den tekniska frontlinjen inom service mesh och mikrotjänster driva av eBPF-tekniken. Avhandlingen analyserar vidare hur användandet av kretsbrytare och omsändningsmekanismer påverkar mikrotjänsters prestanda. Adaptiva reglersystem för att hantera konfiguration av båda dessa mekanismer föreslås och utvärderas i omfattande experimentent. Resultaten visar att sådana regulatorer är nödvändiga för att öka genomströmningen och samtidigt bibehålla (högre percentiler av) applikationers svarstider. Adaption är särskilt viktigt då faktorer som totala trafikmängden, applikationers prestanda, tillfälliga fel, etc. kan förändras snabbt. 

Avhandlingen introducerar även ett verktyg för att generera godtyckliga testapplikationer för att kunna genomföra mer heltäckande utvärderingar av olika typer av forskningsprogramvara som hanterar mikrotjänster. Avhandlingen bidrar även till reproducerbarhet genom att studera hur programvaru-artefakter bäst bör utvärderas inom forskningsområdet distribuerade system. Detta sker genom att sammanställa, och utöka, befintliga rekommendationer inom området.

Place, publisher, year, edition, pages
Umeå: Umeå University, 2023. p. 61
Series
Report / UMINF, ISSN 0348-0542 ; 23:04
Keywords
Microservices, Autonomic Computing, Service Mesh, Reliability, Circuit Breaker, Retry, Microservice Resiliency, Microservice Benchmarking, Reproducibility, Repeatability.
National Category
Computer Sciences Software Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-206990 (URN)978-91-8070-022-1 (ISBN)978-91-8070-023-8 (ISBN)
Public defence
2023-05-26, Aula Anatomica (BIO.A.206), Biologihuset, Umeå, 13:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2023-05-03 Created: 2023-04-24 Last updated: 2023-04-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Saleh Sedghpour, Mohammad RezaKlein, CristianTordsson, Johan

Search in DiVA

By author/editor
Saleh Sedghpour, Mohammad RezaGarlan, DavidSchmerl, BradleyKlein, CristianTordsson, Johan
By organisation
Department of Computing Science
Computer EngineeringSoftware Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 639 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf