ScSF: a scheduling simulation framework
2017 (English)In: Proceedings of the 21th Workshop on Job Scheduling Strategies for Parallel Processing, 2017Conference paper (Refereed)
High-throughput and data-intensive applications are increasingly present, often composed as workflows, in the workloads of current HPC systems. At the same time, trends for future HPC systems point towards more heterogeneous systems with deeper I/O and memory hierarchies. However, current HPC schedulers are designed to support classical large tightly coupled parallel jobs over homogeneous systems. Therefore, There is an urgent need to investigate new scheduling algorithms that can manage the future workloads on HPC systems. However, there is a lack of appropriate models and frameworks to enable development, testing, and validation of new scheduling ideas. In this paper, we present an open-source scheduler simulation framework (ScSF) that covers all the steps of scheduling research through simulation. ScSF provides capabilities for workload modeling, workload generation, system simulation, comparative workload analysis, and experiment orchestration. The simulator is designed to be run over a distributed computing infrastructure enabling to test at scale. We describe in detail a use case of ScSF to develop new techniques to manage scientific workflows in a batch scheduler. In the use case, such technique was implemented in the framework scheduler. For evaluation purposes, 1728 experiments, equivalent to 33 years of simulated time, were run in a deployment of ScSF over a distributed infrastructure of 17 compute nodes during two months. Finally, the experimental results were analyzed in the framework to judge that the technique minimizes workflows’ turnaround time without over-allocating resources. Finally, we discuss lessons learned from our experiences that will help future researchers.
Place, publisher, year, edition, pages
Lecture Notes in Computer Science, ISSN 0302-9743
slurm, simulation, scheduling, HPC, High Performance Computing, workload, generation, analysis
Research subject Computer Science
IdentifiersURN: urn:nbn:se:umu:diva-132981OAI: oai:DiVA.org:umu-132981DiVA: diva2:1084845
21th Workshop on Job Scheduling Strategies for Parallel Processing (JSSP 2017), Orlando FL, USA, June 2nd, 2017.
FundereSSENCE - An eScience CollaborationSwedish Research Council, C0590801