umu.sePublikationer
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
BETA
Alternativa namn
Publikationer (10 of 10) Visa alla publikationer
Gonzalo P., R., Elmroth, E., Östberg, P.-O. & Ramakrishnan, L. (2018). ScSF: a scheduling simulation framework. In: Proceedings of the 21th Workshop on Job Scheduling Strategies for Parallel Processing: . Paper presented at 21th Workshop on Job Scheduling Strategies for Parallel Processing (JSSP 2017), Orlando FL, USA, June 2nd, 2017 (pp. 152-173). Springer, 10773
Öppna denna publikation i ny flik eller fönster >>ScSF: a scheduling simulation framework
2018 (Engelska)Ingår i: Proceedings of the 21th Workshop on Job Scheduling Strategies for Parallel Processing, Springer, 2018, Vol. 10773, s. 152-173Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

High-throughput and data-intensive applications are increasingly present, often composed as workflows, in the workloads of current HPC systems. At the same time, trends for future HPC systems point towards more heterogeneous systems with deeper I/O and memory hierarchies. However, current HPC schedulers are designed to support classical large tightly coupled parallel jobs over homogeneous systems. Therefore, There is an urgent need to investigate new scheduling algorithms that can manage the future workloads on HPC systems. However, there is a lack of appropriate models and frameworks to enable development, testing, and validation of new scheduling ideas.

In this paper, we present an open-source scheduler simulation framework (ScSF) that covers all the steps of scheduling research through simulation. ScSF provides capabilities for workload modeling, workload generation, system simulation, comparative workload analysis, and experiment orchestration. The simulator is designed to be run over a distributed computing infrastructure enabling to test at scale. We describe in detail a use case of ScSF to develop new techniques to manage scientific workflows in a batch scheduler. In the use case, such technique was implemented in the framework scheduler. For evaluation purposes, 1728 experiments, equivalent to 33 years of simulated time, were run in a deployment of ScSF over a distributed infrastructure of 17 compute nodes during two months. Finally, the experimental results were analyzed in the framework to judge that the technique minimizes workflows’ turnaround time without over-allocating resources. Finally, we discuss lessons learned from our experiences that will help future researchers.

Ort, förlag, år, upplaga, sidor
Springer, 2018
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349
Nyckelord
slurm, simulation, scheduling, HPC, High Performance Computing, workload, generation, analysis
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-132981 (URN)10.1007/978-3-319-77398-8_9 (DOI)000444863700009 ()978-3-319-77397-1 (ISBN)978-3-319-77398-8 (ISBN)
Konferens
21th Workshop on Job Scheduling Strategies for Parallel Processing (JSSP 2017), Orlando FL, USA, June 2nd, 2017
Forskningsfinansiär
eSSENCE - An eScience CollaborationVetenskapsrådet, C0590801
Anmärkning

Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.

Tillgänglig från: 2017-03-27 Skapad: 2017-03-27 Senast uppdaterad: 2018-10-05Bibliografiskt granskad
Rodrigo, G. P., Östberg, P.-O., Elmroth, E., Antypas, K., Gerber, R. & Ramakrishnan, L. (2018). Towards understanding HPC users and systems: a NERSC case study. Journal of Parallel and Distributed Computing, 111, 206-221
Öppna denna publikation i ny flik eller fönster >>Towards understanding HPC users and systems: a NERSC case study
Visa övriga...
2018 (Engelska)Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 111, s. 206-221Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

High performance computing (HPC) scheduling landscape currently faces new challenges due to the changes in the workload. Previously, HPC centers were dominated by tightly coupled MPI jobs. HPC workloads increasingly include high-throughput, data-intensive, and stream-processing applications. As a consequence, workloads are becoming more diverse at both application and job levels, posing new challenges to classical HPC schedulers. There is a need to understand the current HPC workloads and their evolution to facilitate informed future scheduling research and enable efficient scheduling in future HPC systems.

In this paper, we present a methodology to characterize workloads and assess their heterogeneity, at a particular time period and its evolution over time. We apply this methodology to the workloads of three systems (Hopper, Edison, and Carver) at the National Energy Research Scientific Computing Center (NERSC). We present the resulting characterization of jobs, queues, heterogeneity, and performance that includes detailed information of a year of workload (2014) and evolution through the systems' lifetime (2010–2014).

Ort, förlag, år, upplaga, sidor
Elsevier, 2018
Nyckelord
Workload analysis, Supercomputer, HPC, Scheduling, NERSC, Heterogeneity, k-means
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
administrativ databehandling
Identifikatorer
urn:nbn:se:umu:diva-132980 (URN)10.1016/j.jpdc.2017.09.002 (DOI)000415028900017 ()
Forskningsfinansiär
eSSENCE - An eScience CollaborationEU, Horisont 2020, 610711EU, Horisont 2020, 732667Vetenskapsrådet, C0590801
Anmärkning

Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.

Originally included in thesis in manuscript form in 2017.

Tillgänglig från: 2017-03-27 Skapad: 2017-03-27 Senast uppdaterad: 2018-06-25Bibliografiskt granskad
Gonzalo P., R. (2017). HPC scheduling in a brave new world. (Doctoral dissertation). Umeå: Umeå universitet
Öppna denna publikation i ny flik eller fönster >>HPC scheduling in a brave new world
2017 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Many breakthroughs in scientific and industrial research are supported by simulations and calculations performed on high performance computing (HPC) systems. These systems typically consist of uniform, largely parallel compute resources and high bandwidth concurrent file systems interconnected by low latency synchronous networks. HPC systems are managed by batch schedulers that order the execution of application jobs to maximize utilization while steering turnaround time. In the past, demands for greater capacity were met by building more powerful systems with more compute nodes, greater transistor densities, and higher processor operating frequencies. Unfortunately, the scope for further increases in processor frequency is restricted by the limitations of semiconductor technology. Instead, parallelism within processors and in numbers of compute nodes is increasing, while the capacity of single processing units remains unchanged. In addition, HPC systems’ memory and I/O hierarchies are becoming deeper and more complex to keep up with the systems’ processing power. HPC applications are also changing: the need to analyze large data sets and simulation results is increasing the importance of data processing and data-intensive applications. Moreover, composition of applications through workflows within HPC centers is becoming increasingly important. This thesis addresses the HPC scheduling challenges created by such new systems and applications. It begins with a detailed analysis of the evolution of the workloads of three reference HPC systems at the National Energy Research Supercomputing Center (NERSC), with a focus on job heterogeneity and scheduler performance. This is followed by an analysis and improvement of a fairshare prioritization mechanism for HPC schedulers. The thesis then surveys the current state of the art and expected near-future developments in HPC hardware and applications, and identifies unaddressed scheduling challenges that they will introduce. These challenges include application diversity and issues with workflow scheduling or the scheduling of I/O resources to support applications. Next, a cloud-inspired HPC scheduling model is presented that can accommodate application diversity, takes advantage of malleable applications, and enables short wait times for applications. Finally, to support ongoing scheduling research, an open source scheduling simulation framework is proposed that allows new scheduling algorithms to be implemented and evaluated in a production scheduler using workloads modeled on those of a real system. The thesis concludes with the presentation of a workflow scheduling algorithm to minimize workflows’ turnaround time without over-allocating resources.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå universitet, 2017. s. 122
Serie
Report / UMINF, ISSN 0348-0542 ; 17.05
Nyckelord
High Performance Computing, HPC, supercomputing, scheduling, workflows, workloads, exascale
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
administrativ databehandling
Identifikatorer
urn:nbn:se:umu:diva-132983 (URN)978-91-7601-693-0 (ISBN)
Disputation
2017-04-21, MA121, MIT-Huset, Umeå Universitet, Umeå, 10:15 (Engelska)
Opponent
Handledare
Forskningsfinansiär
eSSENCE - An eScience CollaborationVetenskapsrådet, C0590801EU, Horisont 2020, 610711EU, FP7, Sjunde ramprogrammet, 732667
Anmärkning

Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.

Tillgänglig från: 2017-03-29 Skapad: 2017-03-27 Senast uppdaterad: 2018-06-09Bibliografiskt granskad
Rodrigo, G. P., Östberg, P.-O., Elmroth, E., Antypas, K., Gerber, R. & Ramakrishnan, L. (2016). Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study. In: 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID): . Paper presented at 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), MAY 16-19, 2016, Cartagena, COLOMBIA (pp. 521-526).
Öppna denna publikation i ny flik eller fönster >>Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study
Visa övriga...
2016 (Engelska)Ingår i: 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, s. 521-526Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

The high performance computing (HPC) scheduling landscape is changing. Increasingly, there are large scientific computations that include high-throughput, data-intensive, and stream-processing compute models. These jobs increase the workload heterogeneity, which presents challenges for classical tightly coupled MPI job oriented HPC schedulers. Thus, it is important to define new analyses methods to understand the heterogeneity of the workload, and its possible effect on the performance of current systems. In this paper, we present a methodology to assess the job heterogeneity in workloads and scheduling queues. We apply the method on the workloads of three current National Energy Research Scientific Computing Center (NERSC) systems in 2014. Finally, we present the results of such analysis, with an observation that heterogeneity might reduce predictability in the jobs' wait time.

Serie
IEEE-ACM International Symposium on Cluster Cloud and Grid Computing, ISSN 2376-4414
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-126538 (URN)10.1109/CCGrid.2016.32 (DOI)000382529800067 ()978-1-5090-2453-7 (ISBN)
Konferens
16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), MAY 16-19, 2016, Cartagena, COLOMBIA
Tillgänglig från: 2016-10-28 Skapad: 2016-10-10 Senast uppdaterad: 2018-06-09Bibliografiskt granskad
Rodrigo, G. P., Östberg, P.-O., Elmroth, E. & Ramakrishnan, L. (2015). A2L2: an application aware flexible HPC scheduling model for low-latency allocation. In: VTDC '15: proceedings of the 8th International workshop on virtualization technologies in distributed computing. Paper presented at 8th International Workshop on Virtualization Technologies in Distributed Computing (VTDC), Portland, Oregon, June 15-16, 2015. (pp. 11-19). ACM Digital Library
Öppna denna publikation i ny flik eller fönster >>A2L2: an application aware flexible HPC scheduling model for low-latency allocation
2015 (Engelska)Ingår i: VTDC '15: proceedings of the 8th International workshop on virtualization technologies in distributed computing, ACM Digital Library, 2015, , s. 11-19s. 11-19Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

High-performance computing (HPC) is focused on providing large-scale compute capacity to scientific applications. HPC schedulers tend to be optimized for large parallel batch jobs and, as such, often overlook the requirements of other scientific applications. In this work, we propose a cloud-inspired HPC scheduling model that aims to capture application performance and requirement models (Application Aware - A2) and dynamically resize malleable application resource allocations to be able to support applications with critical performance or deadline requirements. (Low Latency allocation - L2). The proposed model incorporates measures to improve data-intensive applications performance on HPC systems and is derived from a set of cloud scheduling techniques that are identified as applicable in HPC environments. The model places special focus on dynamically malleable applications; data-intensive applications that support dynamic resource allocation without incurring severe performance penalties; which are proposed for fine-grained back-filling and dynamic resource allocation control without job preemption.

Ort, förlag, år, upplaga, sidor
ACM Digital Library, 2015. s. 11-19
Nyckelord
Scheduling, job, HPC, malleable, applications, low-latency
Nationell ämneskategori
Datorsystem
Forskningsämne
administrativ databehandling
Identifikatorer
urn:nbn:se:umu:diva-110526 (URN)10.1145/2755979.2755983 (DOI)978-1-4503-3573-7 (ISBN)
Konferens
8th International Workshop on Virtualization Technologies in Distributed Computing (VTDC), Portland, Oregon, June 15-16, 2015.
Forskningsfinansiär
eSSENCE - An eScience CollaborationEU, FP7, Sjunde ramprogrammet, 610711Vetenskapsrådet, C0590801
Anmärkning

Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.

Tillgänglig från: 2015-10-22 Skapad: 2015-10-22 Senast uppdaterad: 2018-06-07Bibliografiskt granskad
Rodrigo, G. P., Östberg, P.-O., Elmroth, E., Antypas, K., Gerber, R. & Ramakrishnan, L. (2015). HPC System Lifetime Story: Workload Characterization and Evolutionary Analyses on NERSC Systems. In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HDPC): . Paper presented at The 24th International Symposium on High-Performance Parallel and Distributed Computing (HDPC), Portland, Oregon, June 15-19, 2015 (pp. 57-60). ACM Digital Library
Öppna denna publikation i ny flik eller fönster >>HPC System Lifetime Story: Workload Characterization and Evolutionary Analyses on NERSC Systems
Visa övriga...
2015 (Engelska)Ingår i: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HDPC), ACM Digital Library, 2015, , s. 57-60s. 57-60Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

High performance computing centers have traditionally served monolithic MPI applications. However, in recent years, many of the large scientific computations have included high throughput and data-intensive jobs. HPC systems have mostly used batch queue schedulers to schedule these workloads on appropriate resources. There is a need to understand future scheduling scenarios that can support the diverse scientific workloads in HPC centers. In this paper, we analyze the workloads on two systems (Hopper and Carver) at the National Energy Research Scientific Computing (NERSC) Center. Specifically, we present a trend analysis towards understanding the evolution of the workload over the lifetime of the two systems.

Ort, förlag, år, upplaga, sidor
ACM Digital Library, 2015. s. 57-60
Nyckelord
Scheduling, workload, trend analysis, HPC
Nationell ämneskategori
Datorsystem
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-110525 (URN)10.1145/2749246.2749270 (DOI)978-1-4503-3550-8 (ISBN)
Konferens
The 24th International Symposium on High-Performance Parallel and Distributed Computing (HDPC), Portland, Oregon, June 15-19, 2015
Forskningsfinansiär
eSSENCE - An eScience CollaborationEU, FP7, Sjunde ramprogrammet, 610711Vetenskapsrådet, C0590801
Tillgänglig från: 2015-10-22 Skapad: 2015-10-22 Senast uppdaterad: 2018-06-07Bibliografiskt granskad
Rodrigo, G. P., Östberg, P.-O. & Elmroth, E. (2015). Priority operators for fairshare scheduling. In: Job scheduling strategies for parallel processing (JSSPP 2014): . Paper presented at 18th International Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), MAY 23, 2014, Phoenix, AZ (pp. 70-89).
Öppna denna publikation i ny flik eller fönster >>Priority operators for fairshare scheduling
2015 (Engelska)Ingår i: Job scheduling strategies for parallel processing (JSSPP 2014), 2015, s. 70-89Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Collaborative resource sharing in distributed computing requires scalable mechanisms for allocation and control of user quotas. Decentralized fairshare prioritization is a technique for enforcement of user quotas that can be realized without centralized control. The technique is based on influencing the job scheduling order of local resource management systems using an algorithm that establishes a semantic for prioritization of jobs based on the individual distances between user's quota allocations and user's historical resource usage (i.e. intended and current system state). This work addresses the design and evaluation of priority operators, mathematical functions to quantify fairshare distances, and identify a set of desirable characteristics for fairshare priority operators. In addition, this work also proposes a set of operators for fairshare prioritization, establishes a methodology for verification and evaluation of operator characteristics, and evaluates the proposed operator set based on this mathematical framework. Limitations in the numerical representation of scheduling factor values are identified as a key challenge in priority operator formulation, and it is demonstrated that the contributed priority operators (the Sigmoid operator family) behave robustly even in the presence of severe resolution limitations.

Serie
Lecture Notes in Computer Science, ISSN 0302-9743 ; 8828
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-106517 (URN)10.1007/978-3-319-15789-4_5 (DOI)000355729800005 ()978-3-319-15788-7 (ISBN)978-3-319-15789-4 (ISBN)
Konferens
18th International Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP), MAY 23, 2014, Phoenix, AZ
Tillgänglig från: 2015-07-15 Skapad: 2015-07-14 Senast uppdaterad: 2018-06-07Bibliografiskt granskad
Rodrigo, G. P. (2014). Establishing the equivalence between operators: theorem to establish a sufficient condition for two operators to produce the same ordering in a Faishare prioritization system. Umeå: Umeå universitet
Öppna denna publikation i ny flik eller fönster >>Establishing the equivalence between operators: theorem to establish a sufficient condition for two operators to produce the same ordering in a Faishare prioritization system
2014 (Engelska)Rapport (Övrigt vetenskapligt)
Abstract [en]

Theorem to establish a sufficient condition for two operators to produce the same ordering in a Faishare prioritization system

Ort, förlag, år, upplaga, sidor
Umeå: Umeå universitet, 2014. s. 4
Serie
Report / UMINF, ISSN 0348-0542 ; 14.15
Nyckelord
fairshare proof operator equivalence ordering
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-89297 (URN)
Forskningsfinansiär
eSSENCE - An eScience Collaboration
Tillgänglig från: 2014-05-27 Skapad: 2014-05-27 Senast uppdaterad: 2018-06-07Bibliografiskt granskad
Rodrigo, G. P. (2014). Proof of compliance for the relative operator on the proportional distribution of unused share in an ordering fairshare system. Umeå: Umeå universitet
Öppna denna publikation i ny flik eller fönster >>Proof of compliance for the relative operator on the proportional distribution of unused share in an ordering fairshare system
2014 (Engelska)Rapport (Övrigt vetenskapligt)
Abstract [en]

Decentralized prioritization is a technique to in grid fairshare scheduling without centralized control. The technique uses an algorithm that measures individual user distances between quota allocations and historical resource usage (intended and current system state) to establish a semantic for prioritization.

When a a user in a subgroup doesn't use its share, it is desired that its corresponding usage time is divided among the active users proportionally to their corresponding target usage.

This report is the mathematical proof for the relative operator on the proportional distribution of unused share among the users of the same subgroup.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå universitet, 2014. s. 12
Serie
Report / UMINF, ISSN 0348-0542 ; 14.14
Nyckelord
fairshare fair-share relative operator distribution property proof
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-89298 (URN)
Tillgänglig från: 2014-05-27 Skapad: 2014-05-27 Senast uppdaterad: 2018-06-07Bibliografiskt granskad
Gonzalo P., R., Elmroth, E., Östberg, P.-O. & Ramakrishnan, L.Enabling workflow aware scheduling on HPC systems.
Öppna denna publikation i ny flik eller fönster >>Enabling workflow aware scheduling on HPC systems
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

Workƒows from diverse scienti€c domains are increasingly present in the workloads of current HPC systems. However, HPC scheduling systems do not incorporate workƒow speci€c mechanisms beyond the capacity to declare dependencies between jobs. Œus, when users run workƒows as sets of batch jobs with completion dependencies, the workƒows experience long turn around times. Alternatively, when they are submiŠed as single jobs, allocating the maximum requirementof resources for the whole runtime, they resources, reducing the HPC system utilization. In this paper, we present a workƒow aware scheduling (WoAS) system that enables pre-existing scheduling algorithms to take advantage of the €ne grained workƒow resource requirements and structure, without any modi€cation to the original algorithms. Œe current implementation of WoAS is integrated in Slurm, a widely used HPC batch scheduler. We evaluate the system in simulation using real and synthetic workƒows and a synthetic baseline workload that captures the job paŠerns observed over three years of the real workload data of Edison, a large supercomputer hosted at the National Energy Research Scienti€c Computing Center. Finally, our results show that WoAS e‚ectively reduces workƒow turnaround time and improves system utilization without a signi€cant impact on the slowdown of traditional jobs.

Nyckelord
scheduling, workflows, HPC, supercomputing, High Performance Computing
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
administrativ databehandling
Identifikatorer
urn:nbn:se:umu:diva-132982 (URN)
Forskningsfinansiär
eSSENCE - An eScience CollaborationVetenskapsrådet, C0590801
Anmärkning

Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.

Tillgänglig från: 2017-03-27 Skapad: 2017-03-27 Senast uppdaterad: 2018-06-09
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0003-3315-8253

Sök vidare i DiVA

Visa alla publikationer