umu.sePublikasjoner
Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
E-HPC: A Library for Elastic Resource Management in HPC Environments
Lawrence Berkeley National Laboratory. (School of Computer Science, Georgia Institute of Technology, Atlanta, Georgia)
(Lawrence Berkeley National Laboratory)
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. Lawrence Berkeley National Laboratory. (Distributed Systems)
(Lawrence Berkeley National Laboratory)
Vise andre og tillknytning
2017 (engelsk)Inngår i: 12th Workshop on Workflows in Support of Large-Scale Science (WORKS), New York, NY, USA: Association for Computing Machinery (ACM), 2017, artikkel-id 1Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Next-generation data-intensive scientific workflows need to support streaming and real-time applications with dynamic resource needs on high performance computing (HPC) platforms. The static resource allocation model on current HPC systems that was designed for monolithic MPI applications is insufficient to support the elastic resource needs of current and future workflows. In this paper, we discuss the design, implementation and evaluation of Elastic-HPC (E-HPC), an elastic framework for managing resources for scientific workflows on current HPC systems. E-HPC considers a resource slot for a workflow as an elastic window that might map to different physical resources over the duration of a workflow. Our framework uses checkpoint-restart as the underlying mechanism to migrate workflow execution across the dynamic window of resources. E-HPC provides the foundation necessary to enable dynamic resource allocation of HPC resources that are needed for streaming and real-time workflows. E-HPC has negligible overhead beyond the cost of checkpointing. Additionally, E-HPC results in decreased turnaround time of workflows compared to traditional model of resource allocation for workflows, where resources are allocated per stage of the workflow. Our evaluation shows that E-HPC improves core hour utilization for common workflow resource use patterns and provides an effective framework for elastic expansion of resources for applications with dynamic resource needs.

sted, utgiver, år, opplag, sider
New York, NY, USA: Association for Computing Machinery (ACM), 2017. artikkel-id 1
Emneord [en]
high performance computing, scientific workflows, resource management
HSV kategori
Forskningsprogram
datalogi
Identifikatorer
URN: urn:nbn:se:umu:diva-142624DOI: 10.1145/3150994.3150996ISBN: 978-1-4503-5129-4 (tryckt)OAI: oai:DiVA.org:umu-142624DiVA, id: diva2:1163222
Konferanse
The International Conference for High Performance Computing, Networking, Storage and Analysis
Tilgjengelig fra: 2017-12-06 Laget: 2017-12-06 Sist oppdatert: 2019-01-24bibliografisk kontrollert
Inngår i avhandling
1. Application-aware resource management for datacenters
Åpne denne publikasjonen i ny fane eller vindu >>Application-aware resource management for datacenters
2018 (engelsk)Licentiatavhandling, med artikler (Annet vitenskapelig)
Alternativ tittel[sv]
Applikationsmedveten resurshantering för datacenter
Abstract [en]

High Performance Computing (HPC) and Cloud Computing datacenters are extensively used to steer and solve complex problems in science, engineering, and business, such as calculating correlations and making predictions. Already in a single datacenter server, there are thousands of hardware and software metrics – Key Performance Indicators (KPIs) – that individually and aggregated can give insight in the performance, robustness, and efficiency of the datacenter and the provisioned applications. At the datacenter level, the number of KPIs is even higher. The fast growing interest on datacenter management from both public and industry together with the rapid expansion in scale and complexity of datacenter resources and the services being provided on them have made monitoring, profiling, controlling, and provisioning compute resources dynamically at runtime into a challenging and complex task. Commonly, correlations of application KPIs, like response time and throughput, with resource capacities show that runtime systems (e.g., containers or virtual machines) that are used to provision these applications do not utilize available resources efficiently. This reduces datacenter efficiency, which in term results in higher operational costs and longer waiting times for results.

The goal of this thesis is to develop tools and autonomic techniques for improving datacenter operations, management and utilization, while improving and/or minimizing impacts on applications performance. To this end, we make use of application resource descriptors to create a library that dynamically adjusts the amount of resources used, enabling elasticity for scientific workflows in HPC datacenters. For mission critical applications, high availability is of great concern since these services must be kept running even in the event of system failures. By modeling and correlating specific resource counters, like CPU, memory and network utilization, with the number of runtime synchronizations, we present adaptive mechanisms to dynamically select which fault tolerant mechanism to use. Likewise, for scientific applications we propose a hybrid extensible architecture for dual-level scheduling of data intensive jobs in HPC infrastructures, allowing operational simplification, on-boarding of new types of applications and achieving greater job throughput with higher overall datacenter efficiency.

sted, utgiver, år, opplag, sider
Umeå: Department of computing science, Umeå university, 2018. s. 28
Serie
Report / UMINF, ISSN 0348-0542 ; 18.14
Emneord
Resource Management, High Performance Computing, Cloud Computing
HSV kategori
Forskningsprogram
datalogi
Identifikatorer
urn:nbn:se:umu:diva-155620 (URN)978-91-7601-971-9 (ISBN)
Presentation
2018-12-12, MA121, MIT-Huset, Umeå, 20:31 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2019-01-25 Laget: 2019-01-24 Sist oppdatert: 2019-02-04bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekst

Personposter BETA

Souza, Abel

Søk i DiVA

Av forfatter/redaktør
Souza, Abel
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric

doi
isbn
urn-nbn
Totalt: 294 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf