Optimizing Distributed Tracing Overhead in a Cloud Environment with OpenTelemetry
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
To gain observability in distributed systems, some telemetry generation and gathering must be implemented. This is especially important when systems have layers of dependencies on other microservices. One method for observability is called distributed tracing. Distributed tracing is the act of building causal event chains between microservices, which are called traces. Finding bottlenecks and dependencies within each call chain is possible with the traces. One framework for implementing distributed tracing is OpenTelemetry. The developer must determine design choices when deploying OpenTelemetry in a Kubernetes cluster. For example, OpenTelemetry provides a collector that collects spans, which are parts of a trace from microservices. These collectors can be deployed one on each node, called a daemonset. Or it can be deployed with one for each service, called sidecars. This study compared the performance impact of the sidecar and daemonset setup to that of having no OpenTelemetry implemented. The resources analyzed were CPU usage, network usage, and RAM usage. Tests were done in a permutation of 4 different scenarios. Experiments were run on 4 and 2 nodes, as well as a balanced and unbalanced service placement setup. The experiments were run in a cloud environment using Kubernetes. The tested system was an emulation of one of Nasdaq's systems based on real data from the company. The study concluded that having OpenTelemetry added overhead / increased resource usage in all cases. Having the daemonset setup, compared to no OpenTelemetry, increased CPU usage by 46.5 %, network usage by 18.25 %, and memory usage by 47.5 % on average. Sidecar did, in most cases, perform worse than the daemonset setup in most cases and resources, especially in RAM and CPU usage.
Place, publisher, year, edition, pages
2024. , p. 43
Series
UMNAD ; 1467
Keywords [en]
OpenTelemetry, Cloud, Distributed tracing, Collector, Optimization, Kubernetes, tracing, Distributed systems
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:umu:diva-225868OAI: oai:DiVA.org:umu-225868DiVA, id: diva2:1867119
External cooperation
Nasdaq
Educational program
Master's Programme in Computing Science
Presentation
2024-05-29, MIT.A.316, Umeå, 10:45 (English)
Supervisors
Examiners
2024-06-242024-06-102024-06-24Bibliographically approved