Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An Empirical Study of Service Mesh Traffic Management Policies for Microservices
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Autonomous Distributed Systems Lab)ORCID iD: 0000-0002-0751-9695
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Autonomous Distributed Systems Lab)ORCID iD: 0000-0003-0106-3049
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Autonomous Distributed Systems Lab)ORCID iD: 0000-0002-5970-7276
2022 (English)In: ICPE '22: Proceedings of the 2022 ACM/SPEC on International Conference on Performance Engineering, New York: ACM Digital Library, 2022, p. 17-27Conference paper, Published paper (Refereed)
Abstract [en]

A microservice architecture features hundreds or even thousands of small loosely coupled services with multiple instances. Because microservice performance depends on many factors including the workload, inter-service traffic management is complex in such dynamic environments. Service meshes aim to handle this complexity and to facilitate management, observability, and communication between microservices. Service meshes provide various traffic management policies such as circuit breaking and retry mechanisms, which are claimed to protect microservices against overload and increase the robustness of communication between microservices. However, there have been no systematic studies on the effects of these mechanisms on microservice performance and robustness. Furthermore, the exact impact of various tuning parameters for circuit breaking and retries are poorly understood. This work presents a large set of experiments conducted to investigate these issues using a representative microservice benchmark in a Kubernetes testbed with the widely used Istio service mesh. Our experiments reveal effective configurations of circuit breakers and retries. The findings presented will be useful to engineers seeking to configure service meshes more systematically and also open up new areas of research for academics in the area of service meshes for (autonomic) microservice resource management.

Place, publisher, year, edition, pages
New York: ACM Digital Library, 2022. p. 17-27
Keywords [en]
microservices, service mesh, traffic management, circuit breaking, retry, microservice resiliency
National Category
Computer Systems
Research subject
Computer Systems
Identifiers
URN: urn:nbn:se:umu:diva-193341DOI: 10.1145/3489525.3511686ISI: 000883411400004Scopus ID: 2-s2.0-85128651087OAI: oai:DiVA.org:umu-193341DiVA, id: diva2:1647486
Conference
CPE '22: ACM/SPEC International Conference on Performance Engineering, Bejing, China, April 9 - 13, 2022
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Available from: 2022-03-28 Created: 2022-03-28 Last updated: 2023-09-05Bibliographically approved
In thesis
1. Towards self-driving microservices
Open this publication in new window or tab >>Towards self-driving microservices
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Mot självkörande mikrotjänster
Abstract [en]

In recent years, microservice architecture has become a popular method for software system design and development. This involves creating applications with multiple small services, each with multiple instances, operating as independent processes. Due to the distributed nature of microservices, communication between services presents a challenging task that becomes increasingly complex as the number of services grows. This complexity can even lead to short-term failures that can degrade application performance. Therefore, the auto-tuning of inter-service communication is necessary to prevent such failures. Service meshes were introduced to offer the necessary technical capabilities that can be employed in such scenarios. In essence, a service mesh is an infrastructure layer that includes a set of configurable proxies integrated into microservices. This enables the provision of traffic management policies such as circuit breaking and retry mechanisms to enhance microservice resilience against transient failures. However, static configuration or misconfiguration of these mechanisms is unsuitable for the dynamic environment of microservices and can lead to serious issues and performance problems, such as retry storms.

The goal of this thesis is three-fold. First, it aims to investigate the impact and effectiveness of service traffic management on application reliability and availability in the presence of transient failures. Second, it focuses on auto-tuning of service traffic management to increase carried throughput and maintain carried response time. Third, this research aims to propose measures that can improve research reproducibility in the area of distributed systems ensuring that the findings can be independently verified by others. In this thesis, we aim to offer detailed guidelines on best practices for implementing research software.

To achieve these goals, this thesis delves into the current state-of-the-art in service meshes and eBPF-powered microservices, identifying current challenges and potential future directions. It analyzes the effects of circuit breaker and retry mechanisms on microservice performance and proposes adaptive controllers for both. The results show the need for such controllers that increase throughput while maintaining the tail response time of the application. Additionally, it proposes a microservice benchmark generator to enable systematic microservice benchmark generation and improve reproducibility. It also provides recommendations for improving artifact evaluation in distributed systems research by compiling all existing recommendations.

Abstract [sv]

Mikrotjänster har de senaste åren blivit en populär arkitekturmodell för programvara. Modellen innebär att man skapar applikationer med flera små tjänster, var och en med flera instanser som fungerar som oberoende processer. Mikrotjänsters distribuerade natur gör kommunikationen mellan tjänster mer utmanande. Denna komplexitet kan även ge upphov till tillfälliga fel på grund av lastobalans eller överbelastning och även försämra applikationers prestanda. Av denna anledning är dynamisk konfigurationav kommunikationen mellan mikrotjänster nödvändig. En service mesh är en teknikplattform för att hantera hur mikrotjänster kommunicerar, med funktioner för att enkelt kryptera kommunikation mellan tjänster, mäta prestanda och finkorningt styra kommunikationsflöden.En service mesh implementeras ofta som en uppsättning konfigurerbara proxies. Detta möjliggör trafikhanteringspolicies baserade på mekanismer som kretsbrytning och omsändningar. Statisk konfiguration av dessa mekanismer kan dock ge allvarliga prestandaproblem såsom  låg genomströmning och/eller omfattande omsändningar.

Denna avhandling har tre mål. För det första undersöker den hur trafikhantering för mikrotjänster påverkar tillförlitligheten och tillgängligheten för applikationer vid tillfälliga störningar. För det andra fokuserar den på adaptiv reglering av trafikhantering av mikrotjänster för att öka genomströmningen och samtidigt bibehålla acceptabla svarstider. För det tredje syftar den till att förbättra reproducerbarheten i forskning inom distribuerade system och se till att forskningsresultat enklare kan verifieras av oberoende. 

För att uppnå dessa mål undersöker avhandlingen den tekniska frontlinjen inom service mesh och mikrotjänster driva av eBPF-tekniken. Avhandlingen analyserar vidare hur användandet av kretsbrytare och omsändningsmekanismer påverkar mikrotjänsters prestanda. Adaptiva reglersystem för att hantera konfiguration av båda dessa mekanismer föreslås och utvärderas i omfattande experimentent. Resultaten visar att sådana regulatorer är nödvändiga för att öka genomströmningen och samtidigt bibehålla (högre percentiler av) applikationers svarstider. Adaption är särskilt viktigt då faktorer som totala trafikmängden, applikationers prestanda, tillfälliga fel, etc. kan förändras snabbt. 

Avhandlingen introducerar även ett verktyg för att generera godtyckliga testapplikationer för att kunna genomföra mer heltäckande utvärderingar av olika typer av forskningsprogramvara som hanterar mikrotjänster. Avhandlingen bidrar även till reproducerbarhet genom att studera hur programvaru-artefakter bäst bör utvärderas inom forskningsområdet distribuerade system. Detta sker genom att sammanställa, och utöka, befintliga rekommendationer inom området.

Place, publisher, year, edition, pages
Umeå: Umeå University, 2023. p. 61
Series
Report / UMINF, ISSN 0348-0542 ; 23:04
Keywords
Microservices, Autonomic Computing, Service Mesh, Reliability, Circuit Breaker, Retry, Microservice Resiliency, Microservice Benchmarking, Reproducibility, Repeatability.
National Category
Computer Sciences Software Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-206990 (URN)978-91-8070-022-1 (ISBN)978-91-8070-023-8 (ISBN)
Public defence
2023-05-26, Aula Anatomica (BIO.A.206), Biologihuset, Umeå, 13:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Available from: 2023-05-03 Created: 2023-04-24 Last updated: 2023-04-24Bibliographically approved

Open Access in DiVA

fulltext(1043 kB)800 downloads
File information
File name FULLTEXT01.pdfFile size 1043 kBChecksum SHA-512
ce11fb3fa0b8e99b536780993bc8cd5cc3f123f45d4bdc30b0fb79bd886e0346f3b30e12b4e0c37d17f1ef81b3e666a68a5481e8a5a406533bb8e67a85061e8e
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Saleh Sedghpour, Mohammad RezaKlein, CristianTordsson, Johan

Search in DiVA

By author/editor
Saleh Sedghpour, Mohammad RezaKlein, CristianTordsson, Johan
By organisation
Department of Computing Science
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 800 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 780 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf