Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluating the Performance of Serialization Protocols in Apache Kafka
Umeå University, Faculty of Science and Technology, Department of Computing Science.
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In the context of data-intensive applications, efficient data serialization is essential for maintaining high performance and scalability. This thesis investigates the impact of different serialization protocols on the latency and throughput in Apache Kafka, a widely used distributed streaming platform. Given the diverse array of serialization protocols, this study focuses on four prevalent ones: Apache Avro, Protocol Buffers (Protobuf), JSON, and MessagePack. These protocols were selected based on their widespread use in academic research and industry and their varying approaches to balancing human readability, efficiency, and performance. 

JSON, the most commonly used serialization protocol in many systems, is a baseline for comparison in this study. While JSON offers ease of use and broad compatibility, it may not be optimal in terms of speed and data size efficiency. This research aims to determine whether alternative serialization protocols can improve performance.

This research utilized a testing framework involving two distinct types of tests: batch processing and single message processing. Each test type consisted of 1,048,575 records and was applied across three different data sizes, 1,176 bytes, 4,696 bytes, and 9,312 bytes, to evaluate how the data size impacts serialization and deserialization times, total execution times, throughput, and latency. The throughput is measured in records per second (rps).

The throughput results indicate that MessagePack achieves a two--time higher throughput than JSON. The batch--processing results from lowest to highest size show 34,254 rps vs 14,243 rps, 7,377 rps vs 3411 rps, and 3,802 rps vs 1784 rps. The single-message results show 29,212 rps vs. 14,126 rps, 8,350 rps vs. 3,344 rps and 3,781 rps vs. 1,803 rps. Protobuf showed the highest throughput for the smallest tested data size at 36,945 rps for batch-processing and 36,364 rps for single message processing. Avro showed a slight edge over JSON regarding throughput but was less significant than MessagePack. All the protocols were faster than JSON regarding serialization speeds, the quickest one being Protobuf. 

Regarding latency, Protobuf consistently achieved the lowest median latencies across all test sizes in batch processing, recording 38.97 ms, 57.41 ms, and 63.14 ms for increasing record sizes, whereas JSON showed higher latencies of 77.59 ms, 72.60 ms, and 78.09 ms. In single-message tests, Protobuf also displayed the lowest median latency at 1.68 ms for the smallest size, significantly outperforming JSON’s 7.94 ms. Interestingly, for the record size of 4,696 bytes, JSON exhibited the lowest median latency at 3.76 ms. Avro presented the lowest median latency for the largest size at 2.71 ms, compared to JSON's 4.18 ms.

The results indicate that migrating from JSON to MessagePack or Protobuf (for the lowest size) will increase throughput by twofold.

Protobuf enhances latency metrics across all tested sizes in batch--processing scenarios, making it a convincing choice for systems prioritizing rapid data handling. For single-message tests, Protobuf is recommended for the smallest data size, while Avro offers advantages for the largest data size.

Place, publisher, year, edition, pages
2024. , p. 48
Series
UMNAD ; 1498
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:umu:diva-227363OAI: oai:DiVA.org:umu-227363DiVA, id: diva2:1878772
External cooperation
Sartorius Stedim Data Analytics
Educational program
Master of Science Programme in Computing Science and Engineering
Presentation
2024-05-29, Ma121, Umeå, 14:26 (English)
Supervisors
Examiners
Available from: 2024-06-28 Created: 2024-06-27 Last updated: 2024-06-28Bibliographically approved

Open Access in DiVA

fulltext(1454 kB)398 downloads
File information
File name FULLTEXT01.pdfFile size 1454 kBChecksum SHA-512
1034e94a677d99d0002be45d99a485895e9a45a2b71a6a8aa7d69551a62d28df7cd7af3fce7bb91b3f1369fa127164b3feb38fb5427f93fe25dea1292662d514
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Myastovskiy, Tobias
By organisation
Department of Computing Science
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 398 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 980 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf