Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
A scalable and low latency probe-based scheduler for data analytics frameworks
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. The University of Sydney.
The University of Sydney.
2021 (Engelska)Ingår i: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 103, artikel-id 102752Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Today's data analytics frameworks divide jobs into many parallel tasks such that each task operates on a small partition of data in order to execute jobs with low latency. Such frameworks often rely on probe-based distributed schedulers to tackle the challenge of reducing the associated overhead. Unfortunately, the existing solutions do not perform efficiently under workload fluctuations and heterogeneous job durations. This is due to a problem called Head-of-Line blocking, i.e., short tasks are enqueued at workers behind longer tasks. To overcome this problem, we propose Peacock (Khelghatdoust and Gramoli, 0000) [25] a new fully distributed probe-based scheduling method. Unlike the existing methods, Peacock introduces a novel probe rotation technique. Workers form a ring overlay network and rotate probes using elastic queues of workers. It is augmented by a novel starvation-free probe reordering algorithm executed by workers. We evaluate Peacock against two existing state-of-the-art probe based solutions through a trace driven simulation of up to 20,000 workers and a distributed experiment of 100 workers in Apache Spark under Google, Cloudera, and Yahoo! traces. The performance results indicate that Peacock outperforms the state-of-the-art in all cluster sizes and loads. Our distributed experiments confirm our simulation results.

Ort, förlag, år, upplaga, sidor
Elsevier, 2021. Vol. 103, artikel-id 102752
Nyckelord [en]
Big Data, Distributed System, Load balancing, Peer-to-Peer overlays network, Scheduling
Nationell ämneskategori
Datorsystem
Identifikatorer
URN: urn:nbn:se:umu:diva-180988DOI: 10.1016/j.parco.2021.102752ISI: 000636398900003Scopus ID: 2-s2.0-85101109481OAI: oai:DiVA.org:umu-180988DiVA, id: diva2:1534204
Tillgänglig från: 2021-03-05 Skapad: 2021-03-05 Senast uppdaterad: 2023-09-05Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Khelghatdoust, Mansour

Sök vidare i DiVA

Av författaren/redaktören
Khelghatdoust, Mansour
Av organisationen
Institutionen för datavetenskap
I samma tidskrift
Parallel Computing
Datorsystem

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 80 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf