umu.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Östberg, Per-Olov
Alternative names
Publications (10 of 41) Show all publications
Krzywda, J., Ali-Eldin, A., Carlson, T. E., Östberg, P.-O. & Elmroth, E. (2018). Power-performance tradeoffs in data center servers: DVFS, CPUpinning, horizontal, and vertical scaling. Future generations computer systems, 81, 114-128
Open this publication in new window or tab >>Power-performance tradeoffs in data center servers: DVFS, CPUpinning, horizontal, and vertical scaling
Show others...
2018 (English)In: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 81, p. 114-128Article in journal (Refereed) Published
Abstract [en]

Dynamic Voltage and Frequency Scaling (DVFS), CPU pinning, horizontal, and vertical scaling, are four techniques that have been proposed as actuators to control the performance and energy consumption on data center servers. This work investigates the utility of these four actuators, and quantifies the power-performance tradeoffs associated with them. Using replicas of the German Wikipedia running on our local testbed, we perform a set of experiments to quantify the influence of DVFS, vertical and horizontal scaling, and CPU pinning on end-to-end response time (average and tail), throughput, and power consumption with different workloads. Results of the experiments show that DVFS rarely reduces the power consumption of underloaded servers by more than 5%, but it can be used to limit the maximal power consumption of a saturated server by up to 20% (at a cost of performance degradation). CPU pinning reduces the power consumption of underloaded server (by up to 7%) at the cost of performance degradation, which can be limited by choosing an appropriate CPU pinning scheme. Horizontal and vertical scaling improves both the average and tail response time, but the improvement is not proportional to the amount of resources added. The load balancing strategy has a big impact on the tail response time of horizontally scaled applications.

Keywords
Power-performance tradeoffs, Dynamic Voltage and Frequency Scaling (DVFS), CPU pinning, Horizontal scaling, Vertical scaling
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-132427 (URN)10.1016/j.future.2017.10.044 (DOI)000423652200010 ()2-s2.0-85033772481 (Scopus ID)
Note

Originally published in thesis in manuscript form.

Available from: 2017-03-13 Created: 2017-03-13 Last updated: 2018-06-09Bibliographically approved
Rodrigo, G. P., Östberg, P.-O., Elmroth, E., Antypas, K., Gerber, R. & Ramakrishnan, L. (2018). Towards understanding HPC users and systems: a NERSC case study. Journal of Parallel and Distributed Computing, 111, 206-221
Open this publication in new window or tab >>Towards understanding HPC users and systems: a NERSC case study
Show others...
2018 (English)In: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 111, p. 206-221Article in journal (Refereed) Published
Abstract [en]

High performance computing (HPC) scheduling landscape currently faces new challenges due to the changes in the workload. Previously, HPC centers were dominated by tightly coupled MPI jobs. HPC workloads increasingly include high-throughput, data-intensive, and stream-processing applications. As a consequence, workloads are becoming more diverse at both application and job levels, posing new challenges to classical HPC schedulers. There is a need to understand the current HPC workloads and their evolution to facilitate informed future scheduling research and enable efficient scheduling in future HPC systems.

In this paper, we present a methodology to characterize workloads and assess their heterogeneity, at a particular time period and its evolution over time. We apply this methodology to the workloads of three systems (Hopper, Edison, and Carver) at the National Energy Research Scientific Computing Center (NERSC). We present the resulting characterization of jobs, queues, heterogeneity, and performance that includes detailed information of a year of workload (2014) and evolution through the systems' lifetime (2010–2014).

Place, publisher, year, edition, pages
Elsevier, 2018
Keywords
Workload analysis, Supercomputer, HPC, Scheduling, NERSC, Heterogeneity, k-means
National Category
Computer Sciences
Research subject
Computing Science
Identifiers
urn:nbn:se:umu:diva-132980 (URN)10.1016/j.jpdc.2017.09.002 (DOI)000415028900017 ()
Funder
eSSENCE - An eScience CollaborationEU, Horizon 2020, 610711EU, Horizon 2020, 732667Swedish Research Council, C0590801
Note

Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.

Originally included in thesis in manuscript form in 2017.

Available from: 2017-03-27 Created: 2017-03-27 Last updated: 2018-06-25Bibliographically approved
Byrne, J., Svorobej, S., Giannoutakis, K. M., Tzovaras, D., Byrne, P. J., Östberg, P.-O., . . . Lynn, T. (2017). A review of cloud computing simulation platforms and related environments. In: Proceedings of the 7th International Conference on Cloud Computing and Services Science: Volume 1: CLOSER. Paper presented at CLOSER 2017, 7th International Conference on Cloud Computing and Services Science, Porto, Portugal, 24–26 April, 2017 (pp. 679-691). , 1
Open this publication in new window or tab >>A review of cloud computing simulation platforms and related environments
Show others...
2017 (English)In: Proceedings of the 7th International Conference on Cloud Computing and Services Science: Volume 1: CLOSER, 2017, Vol. 1, p. 679-691Conference paper, Published paper (Refereed)
Abstract [en]

Recent years have seen an increasing trend towards the development of Discrete Event Simulation (DES) platforms to support cloud computing related decision making and research. The complexity of cloud environments is increasing with scale and heterogeneity posing a challenge for the efficient management of cloud applications and data centre resources. The increasing ubiquity of social media, mobile and cloud computing combined with the Internet of Things and emerging paradigms such as Edge and Fog Computing is exacerbating this complexity. Given the scale, complexity and commercial sensitivity of hyperscale computing environments, the opportunity for experimentation is limited and requires substantial investment of resources both in terms of time and effort. DES provides a low risk technique for providing decision support for complex hyperscale computing scenarios. In recent years, there has been a significant increase in the development and extension of tools to support DES for cloud computing resulting in a wide range of tools which vary in terms of their utility and features. Through a review and analysis of available literature, this paper provides an overview and multi-level feature analysis of 33 DES tools for cloud computing environments. This review updates and extends existing reviews to include not only autonomous simulation platforms, but also on plugins and extensions for specific cloud computing use cases. This review identifies the emergence of CloudSim as a de facto base platform for simulation research and shows a lack of tool support for distributed execution (parallel execution on distributed memory systems).

Keywords
Cloud computing, Cloud simulation tools, Data centre, Fog computing
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-135844 (URN)10.5220/0006373006790691 (DOI)978-989-758-243-1 (ISBN)
Conference
CLOSER 2017, 7th International Conference on Cloud Computing and Services Science, Porto, Portugal, 24–26 April, 2017
Available from: 2017-06-07 Created: 2017-06-07 Last updated: 2018-06-09Bibliographically approved
Östberg, P.-O., Byrne, J., Casari, P., Eardley, P., Fernandez Anta, A., Forsman, J., . . . Domaschka, J. (2017). Reliable Capacity Provisioning for Distributed Cloud/Edge/Fog Computing Applications. In: 2017 EUROPEAN CONFERENCE ON NETWORKS AND COMMUNICATIONS (EUCNC): . Paper presented at European Conference on Networks and Communications (EuCNC), JUN 12-15, 2017, Oulu, FINLAND. IEEE
Open this publication in new window or tab >>Reliable Capacity Provisioning for Distributed Cloud/Edge/Fog Computing Applications
Show others...
2017 (English)In: 2017 EUROPEAN CONFERENCE ON NETWORKS AND COMMUNICATIONS (EUCNC), IEEE , 2017Conference paper, Published paper (Refereed)
Abstract [en]

The REliable CApacity Provisioning and enhanced remediation for distributed cloud applications (RECAP) project aims to advance cloud and edge computing technology, to develop mechanisms for reliable capacity provisioning, and to make application placement, infrastructure management, and capacity provisioning autonomous, predictable and optimized. This paper presents the RECAP vision for an integrated edge-cloud architecture, discusses the scientific foundation of the project, and outlines plans for toolsets for continuous data collection, application performance modeling, application and component auto-scaling and remediation, and deployment optimization. The paper also presents four use cases from complementing fields that will be used to showcase the advancements of RECAP.

Place, publisher, year, edition, pages
IEEE, 2017
Series
European Conference on Networks and Communications, ISSN 2475-6490
Keywords
Cloud computing, capacity provisioning, application modeling, workload propagation, data collection, alytics, machine learning, simulation, optimization
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-145820 (URN)10.1109/EuCNC.2017.7980667 (DOI)000425922900028 ()978-1-5386-3873-6 (ISBN)
Conference
European Conference on Networks and Communications (EuCNC), JUN 12-15, 2017, Oulu, FINLAND
Available from: 2018-08-14 Created: 2018-08-14 Last updated: 2018-08-14Bibliographically approved
Gonzalo P., R., Elmroth, E., Östberg, P.-O. & Ramakrishnan, L. (2017). ScSF: a scheduling simulation framework. In: Proceedings of the 21th Workshop on Job Scheduling Strategies for Parallel Processing: . Paper presented at 21th Workshop on Job Scheduling Strategies for Parallel Processing (JSSP 2017), Orlando FL, USA, June 2nd, 2017.
Open this publication in new window or tab >>ScSF: a scheduling simulation framework
2017 (English)In: Proceedings of the 21th Workshop on Job Scheduling Strategies for Parallel Processing, 2017Conference paper, Published paper (Refereed)
Abstract [en]

High-throughput and data-intensive applications are increasingly present, often composed as workflows, in the workloads of current HPC systems. At the same time, trends for future HPC systems point towards more heterogeneous systems with deeper I/O and memory hierarchies. However, current HPC schedulers are designed to support classical large tightly coupled parallel jobs over homogeneous systems. Therefore, There is an urgent need to investigate new scheduling algorithms that can manage the future workloads on HPC systems. However, there is a lack of appropriate models and frameworks to enable development, testing, and validation of new scheduling ideas. In this paper, we present an open-source scheduler simulation framework (ScSF) that covers all the steps of scheduling research through simulation. ScSF provides capabilities for workload modeling, workload generation, system simulation, comparative workload analysis, and experiment orchestration. The simulator is designed to be run over a distributed computing infrastructure enabling to test at scale. We describe in detail a use case of ScSF to develop new techniques to manage scientific workflows in a batch scheduler. In the use case, such technique was implemented in the framework scheduler. For evaluation purposes, 1728 experiments, equivalent to 33 years of simulated time, were run in a deployment of ScSF over a distributed infrastructure of 17 compute nodes during two months. Finally, the experimental results were analyzed in the framework to judge that the technique minimizes workflows’ turnaround time without over-allocating resources. Finally, we discuss lessons learned from our experiences that will help future researchers.

Series
Lecture Notes in Computer Science, ISSN 0302-9743
Keywords
slurm, simulation, scheduling, HPC, High Performance Computing, workload, generation, analysis
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-132981 (URN)
Conference
21th Workshop on Job Scheduling Strategies for Parallel Processing (JSSP 2017), Orlando FL, USA, June 2nd, 2017
Funder
eSSENCE - An eScience CollaborationSwedish Research Council, C0590801
Note

Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.

Available from: 2017-03-27 Created: 2017-03-27 Last updated: 2018-06-09
Rodrigo, G., Östberg, P.-O., Elmroth, E., Antypas, K., Gerber, R. & Ramakrishnan, L. (2016). Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study. In: Proceedings of CCGrid 2016 - The 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing: . Paper presented at CCGrid 2016 - The 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (pp. 521-526).
Open this publication in new window or tab >>Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study
Show others...
2016 (English)In: Proceedings of CCGrid 2016 - The 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2016, p. 521-526Conference paper, Published paper (Refereed)
Abstract [en]

The high performance computing (HPC) scheduling landscape is changing. Increasingly, there are large scientific computations that include high-throughput, data-intensive, and stream-processing compute models. These jobs increase the workload heterogeneity, which presents challenges for classical tightly coupled MPI job oriented HPC schedulers. Thus, it is important to define new analyses methods to understand the heterogeneity of the workload, and its possible effect on the performance of current systems. In this paper, we present a methodology to assess the job heterogeneity in workloads and scheduling queues. We apply the method on the workloads of three current National Energy Research Scientific Computing Center (NERSC) systems in 2014. Finally, we present the results of such analysis, with an observation that heterogeneity might reduce predictability in the jobs’ wait time.

National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-135843 (URN)
Conference
CCGrid 2016 - The 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
Available from: 2017-06-07 Created: 2017-06-07 Last updated: 2018-06-09
Rodrigo, G. P., Östberg, P.-O., Elmroth, E., Antypas, K., Gerber, R. & Ramakrishnan, L. (2016). Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study. In: 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID): . Paper presented at 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), MAY 16-19, 2016, Cartagena, COLOMBIA (pp. 521-526).
Open this publication in new window or tab >>Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study
Show others...
2016 (English)In: 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, p. 521-526Conference paper, Published paper (Refereed)
Abstract [en]

The high performance computing (HPC) scheduling landscape is changing. Increasingly, there are large scientific computations that include high-throughput, data-intensive, and stream-processing compute models. These jobs increase the workload heterogeneity, which presents challenges for classical tightly coupled MPI job oriented HPC schedulers. Thus, it is important to define new analyses methods to understand the heterogeneity of the workload, and its possible effect on the performance of current systems. In this paper, we present a methodology to assess the job heterogeneity in workloads and scheduling queues. We apply the method on the workloads of three current National Energy Research Scientific Computing Center (NERSC) systems in 2014. Finally, we present the results of such analysis, with an observation that heterogeneity might reduce predictability in the jobs' wait time.

Series
IEEE-ACM International Symposium on Cluster Cloud and Grid Computing, ISSN 2376-4414
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-126538 (URN)10.1109/CCGrid.2016.32 (DOI)000382529800067 ()978-1-5090-2453-7 (ISBN)
Conference
16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), MAY 16-19, 2016, Cartagena, COLOMBIA
Available from: 2016-10-28 Created: 2016-10-10 Last updated: 2018-06-09Bibliographically approved
Krzywda, J., Östberg, P.-O. & Elmroth, E. (2015). A Sensor-Actuator Model for Data Center Optimization. In: 2015 INTERNATIONAL CONFERENCE ON CLOUD AND AUTONOMIC COMPUTING (ICCAC): . Paper presented at International Conference on Cloud and Autonomic Computing (ICCAC 2015), Boston, Cambridge, MA, USA, 21-25 September 2015. (pp. 192-195). New York: IEEE Computer Society
Open this publication in new window or tab >>A Sensor-Actuator Model for Data Center Optimization
2015 (English)In: 2015 INTERNATIONAL CONFERENCE ON CLOUD AND AUTONOMIC COMPUTING (ICCAC), New York: IEEE Computer Society, 2015, p. 192-195Conference paper, Published paper (Refereed)
Abstract [en]

Cloud data centers commonly use virtualization technologies to provision compute capacity with a level of indirection between virtual machines and physical resources. In this paper we explore the use of that level of indirection as a means for autonomic data center configuration optimization and propose a sensor-actuator model to capture optimization-relevant relationships between data center events, monitored metrics (sensors data), and management actions (actuators). The model characterizes a wide spectrum of actions to help identify the suitability of different actions in specific situations, and outlines what (and how often) data needs to be monitored to capture, classify, and respond to events that affect the performance of data center operations.

Place, publisher, year, edition, pages
New York: IEEE Computer Society, 2015
National Category
Computer Systems
Research subject
Computing Science
Identifiers
urn:nbn:se:umu:diva-110297 (URN)10.1109/ICCAC.2015.13 (DOI)000380476500018 ()0-7695-5636-1 (ISBN)978-1-4673-9566-3 (ISBN)
Conference
International Conference on Cloud and Autonomic Computing (ICCAC 2015), Boston, Cambridge, MA, USA, 21-25 September 2015.
Available from: 2015-10-20 Created: 2015-10-20 Last updated: 2018-06-07Bibliographically approved
Rodrigo, G. P., Östberg, P.-O., Elmroth, E. & Ramakrishnan, L. (2015). A2L2: an application aware flexible HPC scheduling model for low-latency allocation. In: VTDC '15: proceedings of the 8th International workshop on virtualization technologies in distributed computing. Paper presented at 8th International Workshop on Virtualization Technologies in Distributed Computing (VTDC), Portland, Oregon, June 15-16, 2015. (pp. 11-19). ACM Digital Library
Open this publication in new window or tab >>A2L2: an application aware flexible HPC scheduling model for low-latency allocation
2015 (English)In: VTDC '15: proceedings of the 8th International workshop on virtualization technologies in distributed computing, ACM Digital Library, 2015, , p. 11-19p. 11-19Conference paper, Published paper (Refereed)
Abstract [en]

High-performance computing (HPC) is focused on providing large-scale compute capacity to scientific applications. HPC schedulers tend to be optimized for large parallel batch jobs and, as such, often overlook the requirements of other scientific applications. In this work, we propose a cloud-inspired HPC scheduling model that aims to capture application performance and requirement models (Application Aware - A2) and dynamically resize malleable application resource allocations to be able to support applications with critical performance or deadline requirements. (Low Latency allocation - L2). The proposed model incorporates measures to improve data-intensive applications performance on HPC systems and is derived from a set of cloud scheduling techniques that are identified as applicable in HPC environments. The model places special focus on dynamically malleable applications; data-intensive applications that support dynamic resource allocation without incurring severe performance penalties; which are proposed for fine-grained back-filling and dynamic resource allocation control without job preemption.

Place, publisher, year, edition, pages
ACM Digital Library, 2015. p. 11-19
Keywords
Scheduling, job, HPC, malleable, applications, low-latency
National Category
Computer Systems
Research subject
Computing Science
Identifiers
urn:nbn:se:umu:diva-110526 (URN)10.1145/2755979.2755983 (DOI)978-1-4503-3573-7 (ISBN)
Conference
8th International Workshop on Virtualization Technologies in Distributed Computing (VTDC), Portland, Oregon, June 15-16, 2015.
Funder
eSSENCE - An eScience CollaborationEU, FP7, Seventh Framework Programme, 610711Swedish Research Council, C0590801
Note

Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.

Available from: 2015-10-22 Created: 2015-10-22 Last updated: 2018-06-07Bibliographically approved
Östberg, P.-O. & Barry, M. (2015). Heuristics and Algorithms for Data Center Optimization. In: Proceedings of the 7th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2015): . Paper presented at MISTA (pp. 921-927).
Open this publication in new window or tab >>Heuristics and Algorithms for Data Center Optimization
2015 (English)In: Proceedings of the 7th Multidisciplinary International Conference on Scheduling : Theory and Applications (MISTA 2015), 2015, p. 921-927Conference paper, Published paper (Refereed)
National Category
Computer Sciences
Research subject
Computing Science
Identifiers
urn:nbn:se:umu:diva-135842 (URN)
Conference
MISTA
Available from: 2017-06-07 Created: 2017-06-07 Last updated: 2018-06-09
Organisations

Search in DiVA

Show all publications