umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
ScSF: a scheduling simulation framework
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Distributed Systems)ORCID iD: 0000-0003-3315-8253
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Distributed Systems)
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Distributed Systems)
Lawrence Berkeley National Lab, USA.
2017 (English)In: Proceedings of the 21th Workshop on Job Scheduling Strategies for Parallel Processing, 2017Conference paper (Refereed)
Abstract [en]

High-throughput and data-intensive applications are increasingly present, often composed as workflows, in the workloads of current HPC systems. At the same time, trends for future HPC systems point towards more heterogeneous systems with deeper I/O and memory hierarchies. However, current HPC schedulers are designed to support classical large tightly coupled parallel jobs over homogeneous systems. Therefore, There is an urgent need to investigate new scheduling algorithms that can manage the future workloads on HPC systems. However, there is a lack of appropriate models and frameworks to enable development, testing, and validation of new scheduling ideas. In this paper, we present an open-source scheduler simulation framework (ScSF) that covers all the steps of scheduling research through simulation. ScSF provides capabilities for workload modeling, workload generation, system simulation, comparative workload analysis, and experiment orchestration. The simulator is designed to be run over a distributed computing infrastructure enabling to test at scale. We describe in detail a use case of ScSF to develop new techniques to manage scientific workflows in a batch scheduler. In the use case, such technique was implemented in the framework scheduler. For evaluation purposes, 1728 experiments, equivalent to 33 years of simulated time, were run in a deployment of ScSF over a distributed infrastructure of 17 compute nodes during two months. Finally, the experimental results were analyzed in the framework to judge that the technique minimizes workflows’ turnaround time without over-allocating resources. Finally, we discuss lessons learned from our experiences that will help future researchers.

Place, publisher, year, edition, pages
2017.
Series
Lecture Notes in Computer Science, ISSN 0302-9743
Keyword [en]
slurm, simulation, scheduling, HPC, High Performance Computing, workload, generation, analysis
National Category
Computer Science
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:umu:diva-132981OAI: oai:DiVA.org:umu-132981DiVA: diva2:1084845
Conference
21th Workshop on Job Scheduling Strategies for Parallel Processing (JSSP 2017), Orlando FL, USA, June 2nd, 2017.
Funder
eSSENCE - An eScience CollaborationSwedish Research Council, C0590801
Available from: 2017-03-27 Created: 2017-03-27 Last updated: 2017-03-28
In thesis
1. HPC scheduling in a brave new world
Open this publication in new window or tab >>HPC scheduling in a brave new world
2017 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Many breakthroughs in scientific and industrial research are supported by simulations and calculations performed on high performance computing (HPC) systems. These systems typically consist of uniform, largely parallel compute resources and high bandwidth concurrent file systems interconnected by low latency synchronous networks. HPC systems are managed by batch schedulers that order the execution of application jobs to maximize utilization while steering turnaround time. In the past, demands for greater capacity were met by building more powerful systems with more compute nodes, greater transistor densities, and higher processor operating frequencies. Unfortunately, the scope for further increases in processor frequency is restricted by the limitations of semiconductor technology. Instead, parallelism within processors and in numbers of compute nodes is increasing, while the capacity of single processing units remains unchanged. In addition, HPC systems’ memory and I/O hierarchies are becoming deeper and more complex to keep up with the systems’ processing power. HPC applications are also changing: the need to analyze large data sets and simulation results is increasing the importance of data processing and data-intensive applications. Moreover, composition of applications through workflows within HPC centers is becoming increasingly important. This thesis addresses the HPC scheduling challenges created by such new systems and applications. It begins with a detailed analysis of the evolution of the workloads of three reference HPC systems at the National Energy Research Supercomputing Center (NERSC), with a focus on job heterogeneity and scheduler performance. This is followed by an analysis and improvement of a fairshare prioritization mechanism for HPC schedulers. The thesis then surveys the current state of the art and expected near-future developments in HPC hardware and applications, and identifies unaddressed scheduling challenges that they will introduce. These challenges include application diversity and issues with workflow scheduling or the scheduling of I/O resources to support applications. Next, a cloud-inspired HPC scheduling model is presented that can accommodate application diversity, takes advantage of malleable applications, and enables short wait times for applications. Finally, to support ongoing scheduling research, an open source scheduling simulation framework is proposed that allows new scheduling algorithms to be implemented and evaluated in a production scheduler using workloads modeled on those of a real system. The thesis concludes with the presentation of a workflow scheduling algorithm to minimize workflows’ turnaround time without over-allocating resources.

Place, publisher, year, edition, pages
Umeå: Umeå universitet, 2017. 122 p.
Series
Report / UMINF, ISSN 0348-0542 ; 17.05
Keyword
High Performance Computing, HPC, supercomputing, scheduling, workflows, workloads, exascale
National Category
Computer Science
Research subject
Computing Science
Identifiers
urn:nbn:se:umu:diva-132983 (URN)978-91-7601-693-0 (ISBN)
Public defence
2017-04-21, MA121, MIT-Huset, Umeå Universitet, Umeå, 10:15 (English)
Opponent
Supervisors
Funder
eSSENCE - An eScience CollaborationSwedish Research Council, C0590801EU, Horizon 2020, 610711EU, FP7, Seventh Framework Programme, 732667
Available from: 2017-03-29 Created: 2017-03-27 Last updated: 2017-03-28Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Gonzalo P., RodrigoElmroth, ErikÖstberg, P-O
By organisation
Department of Computing Science
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

Total: 41 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf