umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards understanding HPC users and systems: a NERSC case study
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Distributed Systems)ORCID iD: 0000-0003-3315-8253
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Distributed Systems)
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Distributed Systems)
Lawrence Berkeley National Lab, USA.
Show others and affiliations
(English)Manuscript (preprint) (Other academic)
Abstract [en]

The high performance computing (HPC) scheduling landscape is changing. Previously dominated by tightly coupled MPI jobs, HPC workloads are increasingly including high-throughput, data-intensive, and stream-processing applications. As a consequence, workloads are becoming more diverse at both application and job level, posing new challenges to classical HPC schedulers. There is a need to understand the current HPC workloads and their evolution towards the future in order to perform informed scheduling research and enable efficient scheduling in future HPC systems. In this paper, we present a methodology to characterize workloads and asses their heterogeneity, both for a particular time period and as they evolve over time. We apply this methodology to the workloads of three systems (Hopper, Edison, and Carver) at the National Energy Research Scientific Computing Center (NERSC). We present the resulting characterization of jobs, queues, heterogeneity, and performance that includes detailed information of a year of workload (2014) and evolution through the systems’ lifetime. Among the results, we highlight the observation of discontinuities in the jobs’ wait time for priority groups with high job diversity. Finally, we conclude by summarizing our analysis to establish a reference and inform future scheduling research.

Keyword [en]
workload analysis, supercomputer, HPC, scheduling, NERSC, heterogeneity, k-means
National Category
Computer Science
Research subject
Computing Science
Identifiers
URN: urn:nbn:se:umu:diva-132980OAI: oai:DiVA.org:umu-132980DiVA: diva2:1084838
Funder
eSSENCE - An eScience CollaborationEU, Horizon 2020, 610711EU, Horizon 2020, 732667Swedish Research Council, C0590801
Note

Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.

Available from: 2017-03-27 Created: 2017-03-27 Last updated: 2017-05-29
In thesis
1. HPC scheduling in a brave new world
Open this publication in new window or tab >>HPC scheduling in a brave new world
2017 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Many breakthroughs in scientific and industrial research are supported by simulations and calculations performed on high performance computing (HPC) systems. These systems typically consist of uniform, largely parallel compute resources and high bandwidth concurrent file systems interconnected by low latency synchronous networks. HPC systems are managed by batch schedulers that order the execution of application jobs to maximize utilization while steering turnaround time. In the past, demands for greater capacity were met by building more powerful systems with more compute nodes, greater transistor densities, and higher processor operating frequencies. Unfortunately, the scope for further increases in processor frequency is restricted by the limitations of semiconductor technology. Instead, parallelism within processors and in numbers of compute nodes is increasing, while the capacity of single processing units remains unchanged. In addition, HPC systems’ memory and I/O hierarchies are becoming deeper and more complex to keep up with the systems’ processing power. HPC applications are also changing: the need to analyze large data sets and simulation results is increasing the importance of data processing and data-intensive applications. Moreover, composition of applications through workflows within HPC centers is becoming increasingly important. This thesis addresses the HPC scheduling challenges created by such new systems and applications. It begins with a detailed analysis of the evolution of the workloads of three reference HPC systems at the National Energy Research Supercomputing Center (NERSC), with a focus on job heterogeneity and scheduler performance. This is followed by an analysis and improvement of a fairshare prioritization mechanism for HPC schedulers. The thesis then surveys the current state of the art and expected near-future developments in HPC hardware and applications, and identifies unaddressed scheduling challenges that they will introduce. These challenges include application diversity and issues with workflow scheduling or the scheduling of I/O resources to support applications. Next, a cloud-inspired HPC scheduling model is presented that can accommodate application diversity, takes advantage of malleable applications, and enables short wait times for applications. Finally, to support ongoing scheduling research, an open source scheduling simulation framework is proposed that allows new scheduling algorithms to be implemented and evaluated in a production scheduler using workloads modeled on those of a real system. The thesis concludes with the presentation of a workflow scheduling algorithm to minimize workflows’ turnaround time without over-allocating resources.

Place, publisher, year, edition, pages
Umeå: Umeå universitet, 2017. 122 p.
Series
Report / UMINF, ISSN 0348-0542 ; 17.05
Keyword
High Performance Computing, HPC, supercomputing, scheduling, workflows, workloads, exascale
National Category
Computer Science
Research subject
Computing Science
Identifiers
urn:nbn:se:umu:diva-132983 (URN)978-91-7601-693-0 (ISBN)
Public defence
2017-04-21, MA121, MIT-Huset, Umeå Universitet, Umeå, 10:15 (English)
Opponent
Supervisors
Funder
eSSENCE - An eScience CollaborationSwedish Research Council, C0590801EU, Horizon 2020, 610711EU, FP7, Seventh Framework Programme, 732667
Note

Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.

Available from: 2017-03-29 Created: 2017-03-27 Last updated: 2017-05-29Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Rodrigo, Gonzalo P.Östberg, P-OElmroth, Erik
By organisation
Department of Computing Science
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

Total: 69 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf