umu.sePublications
Change search
Refine search result
1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Gonzalo P., Rodrigo
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    HPC scheduling in a brave new world2017Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Many breakthroughs in scientific and industrial research are supported by simulations and calculations performed on high performance computing (HPC) systems. These systems typically consist of uniform, largely parallel compute resources and high bandwidth concurrent file systems interconnected by low latency synchronous networks. HPC systems are managed by batch schedulers that order the execution of application jobs to maximize utilization while steering turnaround time. In the past, demands for greater capacity were met by building more powerful systems with more compute nodes, greater transistor densities, and higher processor operating frequencies. Unfortunately, the scope for further increases in processor frequency is restricted by the limitations of semiconductor technology. Instead, parallelism within processors and in numbers of compute nodes is increasing, while the capacity of single processing units remains unchanged. In addition, HPC systems’ memory and I/O hierarchies are becoming deeper and more complex to keep up with the systems’ processing power. HPC applications are also changing: the need to analyze large data sets and simulation results is increasing the importance of data processing and data-intensive applications. Moreover, composition of applications through workflows within HPC centers is becoming increasingly important. This thesis addresses the HPC scheduling challenges created by such new systems and applications. It begins with a detailed analysis of the evolution of the workloads of three reference HPC systems at the National Energy Research Supercomputing Center (NERSC), with a focus on job heterogeneity and scheduler performance. This is followed by an analysis and improvement of a fairshare prioritization mechanism for HPC schedulers. The thesis then surveys the current state of the art and expected near-future developments in HPC hardware and applications, and identifies unaddressed scheduling challenges that they will introduce. These challenges include application diversity and issues with workflow scheduling or the scheduling of I/O resources to support applications. Next, a cloud-inspired HPC scheduling model is presented that can accommodate application diversity, takes advantage of malleable applications, and enables short wait times for applications. Finally, to support ongoing scheduling research, an open source scheduling simulation framework is proposed that allows new scheduling algorithms to be implemented and evaluated in a production scheduler using workloads modeled on those of a real system. The thesis concludes with the presentation of a workflow scheduling algorithm to minimize workflows’ turnaround time without over-allocating resources.

  • 2.
    Gonzalo P., Rodrigo
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ramakrishnan, Lavanya
    Lawrence Berkeley National Lab, Berkeley, California, USA.
    ScSF: a scheduling simulation framework2018In: Proceedings of the 21th Workshop on Job Scheduling Strategies for Parallel Processing, Springer, 2018, Vol. 10773, p. 152-173Conference paper (Refereed)
    Abstract [en]

    High-throughput and data-intensive applications are increasingly present, often composed as workflows, in the workloads of current HPC systems. At the same time, trends for future HPC systems point towards more heterogeneous systems with deeper I/O and memory hierarchies. However, current HPC schedulers are designed to support classical large tightly coupled parallel jobs over homogeneous systems. Therefore, There is an urgent need to investigate new scheduling algorithms that can manage the future workloads on HPC systems. However, there is a lack of appropriate models and frameworks to enable development, testing, and validation of new scheduling ideas.

    In this paper, we present an open-source scheduler simulation framework (ScSF) that covers all the steps of scheduling research through simulation. ScSF provides capabilities for workload modeling, workload generation, system simulation, comparative workload analysis, and experiment orchestration. The simulator is designed to be run over a distributed computing infrastructure enabling to test at scale. We describe in detail a use case of ScSF to develop new techniques to manage scientific workflows in a batch scheduler. In the use case, such technique was implemented in the framework scheduler. For evaluation purposes, 1728 experiments, equivalent to 33 years of simulated time, were run in a deployment of ScSF over a distributed infrastructure of 17 compute nodes during two months. Finally, the experimental results were analyzed in the framework to judge that the technique minimizes workflows’ turnaround time without over-allocating resources. Finally, we discuss lessons learned from our experiences that will help future researchers.

  • 3.
    Gonzalo P., Rodrigo
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, P-O
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ramakrishnan, Lavanya
    Lawrence Berkeley National Lab, USA.
    Enabling workflow aware scheduling on HPC systemsManuscript (preprint) (Other academic)
    Abstract [en]

    Workƒows from diverse scienti€c domains are increasingly present in the workloads of current HPC systems. However, HPC scheduling systems do not incorporate workƒow speci€c mechanisms beyond the capacity to declare dependencies between jobs. Œus, when users run workƒows as sets of batch jobs with completion dependencies, the workƒows experience long turn around times. Alternatively, when they are submiŠed as single jobs, allocating the maximum requirementof resources for the whole runtime, they resources, reducing the HPC system utilization. In this paper, we present a workƒow aware scheduling (WoAS) system that enables pre-existing scheduling algorithms to take advantage of the €ne grained workƒow resource requirements and structure, without any modi€cation to the original algorithms. Œe current implementation of WoAS is integrated in Slurm, a widely used HPC batch scheduler. We evaluate the system in simulation using real and synthetic workƒows and a synthetic baseline workload that captures the job paŠerns observed over three years of the real workload data of Edison, a large supercomputer hosted at the National Energy Research Scienti€c Computing Center. Finally, our results show that WoAS e‚ectively reduces workƒow turnaround time and improves system utilization without a signi€cant impact on the slowdown of traditional jobs.

  • 4.
    Rodrigo, Gonzalo P
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Establishing the equivalence between operators: theorem to establish a sufficient condition for two operators to produce the same ordering in a Faishare prioritization system2014Report (Other academic)
    Abstract [en]

    Theorem to establish a sufficient condition for two operators to produce the same ordering in a Faishare prioritization system

  • 5. Rodrigo, Gonzalo P
    Proof of compliance for the relative operator on the proportional distribution of unused share in an ordering fairshare system2014Report (Other academic)
    Abstract [en]

    Decentralized prioritization is a technique to in grid fairshare scheduling without centralized control. The technique uses an algorithm that measures individual user distances between quota allocations and historical resource usage (intended and current system state) to establish a semantic for prioritization.

    When a a user in a subgroup doesn't use its share, it is desired that its corresponding usage time is divided among the active users proportionally to their corresponding target usage.

    This report is the mathematical proof for the relative operator on the proportional distribution of unused share among the users of the same subgroup.

  • 6.
    Rodrigo, Gonzalo P.
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Priority operators for fairshare scheduling2015In: Job scheduling strategies for parallel processing (JSSPP 2014), 2015, p. 70-89Conference paper (Refereed)
    Abstract [en]

    Collaborative resource sharing in distributed computing requires scalable mechanisms for allocation and control of user quotas. Decentralized fairshare prioritization is a technique for enforcement of user quotas that can be realized without centralized control. The technique is based on influencing the job scheduling order of local resource management systems using an algorithm that establishes a semantic for prioritization of jobs based on the individual distances between user's quota allocations and user's historical resource usage (i.e. intended and current system state). This work addresses the design and evaluation of priority operators, mathematical functions to quantify fairshare distances, and identify a set of desirable characteristics for fairshare priority operators. In addition, this work also proposes a set of operators for fairshare prioritization, establishes a methodology for verification and evaluation of operator characteristics, and evaluates the proposed operator set based on this mathematical framework. Limitations in the numerical representation of scheduling factor values are identified as a key challenge in priority operator formulation, and it is demonstrated that the contributed priority operators (the Sigmoid operator family) behave robustly even in the presence of severe resolution limitations.

  • 7.
    Rodrigo, Gonzalo P
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Antypas, Katie
    Lawrence Berkeley National Lab.
    Gerber, Richard
    Lawrence Berkeley National Lab.
    Ramakrishnan, Lavanya
    Lawrence Berkeley National Lab.
    HPC System Lifetime Story: Workload Characterization and Evolutionary Analyses on NERSC Systems2015In: Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HDPC), ACM Digital Library, 2015, , p. 57-60p. 57-60Conference paper (Refereed)
    Abstract [en]

    High performance computing centers have traditionally served monolithic MPI applications. However, in recent years, many of the large scientific computations have included high throughput and data-intensive jobs. HPC systems have mostly used batch queue schedulers to schedule these workloads on appropriate resources. There is a need to understand future scheduling scenarios that can support the diverse scientific workloads in HPC centers. In this paper, we analyze the workloads on two systems (Hopper and Carver) at the National Energy Research Scientific Computing (NERSC) Center. Specifically, we present a trend analysis towards understanding the evolution of the workload over the lifetime of the two systems.

  • 8.
    Rodrigo, Gonzalo P.
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Antypas, Katie
    Lawrence Berkeley National Lab, USA.
    Gerber, Richard
    Lawrence Berkeley National Lab, USA.
    Ramakrishnan, Lavanya
    Lawrence Berkeley National Lab, USA.
    Towards understanding HPC users and systems: a NERSC case study2018In: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 111, p. 206-221Article in journal (Refereed)
    Abstract [en]

    High performance computing (HPC) scheduling landscape currently faces new challenges due to the changes in the workload. Previously, HPC centers were dominated by tightly coupled MPI jobs. HPC workloads increasingly include high-throughput, data-intensive, and stream-processing applications. As a consequence, workloads are becoming more diverse at both application and job levels, posing new challenges to classical HPC schedulers. There is a need to understand the current HPC workloads and their evolution to facilitate informed future scheduling research and enable efficient scheduling in future HPC systems.

    In this paper, we present a methodology to characterize workloads and assess their heterogeneity, at a particular time period and its evolution over time. We apply this methodology to the workloads of three systems (Hopper, Edison, and Carver) at the National Energy Research Scientific Computing Center (NERSC). We present the resulting characterization of jobs, queues, heterogeneity, and performance that includes detailed information of a year of workload (2014) and evolution through the systems' lifetime (2010–2014).

  • 9.
    Rodrigo, Gonzalo P.
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, Per-Olov
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Antypas, Katie
    Gerber, Richard
    Ramakrishnan, Lavanya
    Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study2016In: 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, p. 521-526Conference paper (Refereed)
    Abstract [en]

    The high performance computing (HPC) scheduling landscape is changing. Increasingly, there are large scientific computations that include high-throughput, data-intensive, and stream-processing compute models. These jobs increase the workload heterogeneity, which presents challenges for classical tightly coupled MPI job oriented HPC schedulers. Thus, it is important to define new analyses methods to understand the heterogeneity of the workload, and its possible effect on the performance of current systems. In this paper, we present a methodology to assess the job heterogeneity in workloads and scheduling queues. We apply the method on the workloads of three current National Energy Research Scientific Computing Center (NERSC) systems in 2014. Finally, we present the results of such analysis, with an observation that heterogeneity might reduce predictability in the jobs' wait time.

  • 10.
    Rodrigo, Gonzalo P.
    et al.
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Östberg, P-O
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Elmroth, Erik
    Umeå University, Faculty of Science and Technology, Department of Computing Science.
    Ramakrishnan, Lavanya
    Lawrence Berkeley National Lab, USA.
    A2L2: an application aware flexible HPC scheduling model for low-latency allocation2015In: VTDC '15: proceedings of the 8th International workshop on virtualization technologies in distributed computing, ACM Digital Library, 2015, , p. 11-19p. 11-19Conference paper (Refereed)
    Abstract [en]

    High-performance computing (HPC) is focused on providing large-scale compute capacity to scientific applications. HPC schedulers tend to be optimized for large parallel batch jobs and, as such, often overlook the requirements of other scientific applications. In this work, we propose a cloud-inspired HPC scheduling model that aims to capture application performance and requirement models (Application Aware - A2) and dynamically resize malleable application resource allocations to be able to support applications with critical performance or deadline requirements. (Low Latency allocation - L2). The proposed model incorporates measures to improve data-intensive applications performance on HPC systems and is derived from a set of cloud scheduling techniques that are identified as applicable in HPC environments. The model places special focus on dynamically malleable applications; data-intensive applications that support dynamic resource allocation without incurring severe performance penalties; which are proposed for fine-grained back-filling and dynamic resource allocation control without job preemption.

1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf