umu.sePublikationer
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
BETA
Östberg, Per-Olov
Alternativa namn
Publikationer (10 of 46) Visa alla publikationer
Le Duc, T., García Leiva, R., Casari, P. & Östberg, P.-O. (2019). Machine Learning Methods for Reliable Resource Provisioning in Edge-Cloud Computing: A Survey. ACM Computing Surveys, 52(5), Article ID 94.
Öppna denna publikation i ny flik eller fönster >>Machine Learning Methods for Reliable Resource Provisioning in Edge-Cloud Computing: A Survey
2019 (Engelska)Ingår i: ACM Computing Surveys, ISSN 0360-0300, E-ISSN 1557-7341, Vol. 52, nr 5, artikel-id 94Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Large-scale software systems are currently designed as distributed entities and deployed in cloud data centers. To overcome the limitations inherent to this type of deployment, applications are increasingly being supplemented with components instantiated closer to the edges of networks—a paradigm known as edge computing. The problem of how to efficiently orchestrate combined edge-cloud applications is, however, incompletely understood, and a wide range of techniques for resource and application management are currently in use.

This article investigates the problem of reliable resource provisioning in joint edge-cloud environments, and surveys technologies, mechanisms, and methods that can be used to improve the reliability of distributed applications in diverse and heterogeneous network environments. Due to the complexity of the problem, special emphasis is placed on solutions to the characterization, management, and control of complex distributed applications using machine learning approaches. The survey is structured around a decomposition of the reliable resource provisioning problem into three categories of techniques: workload characterization and prediction, component placement and system consolidation, and application elasticity and remediation. Survey results are presented along with a problem-oriented discussion of the state-of-the-art. A summary of identified challenges and an outline of future research directions are presented to conclude the article.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2019
Nyckelord
Reliability, cloud computing, edge computing, distributed systems, placement, consolidation, autoscaling, remediation, machine learning, optimization
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
datalogi; data- och systemvetenskap
Identifikatorer
urn:nbn:se:umu:diva-163331 (URN)10.1145/3341145 (DOI)2-s2.0-85072380854 (Scopus ID)
Tillgänglig från: 2019-09-16 Skapad: 2019-09-16 Senast uppdaterad: 2019-10-09Bibliografiskt granskad
Krzywda, J., Ali-Eldin, A., Wadbro, E., Östberg, P.-O. & Elmroth, E. (2019). Power Shepherd: Application Performance Aware Power Shifting. In: the 11th IEEE International Conference on Cloud Computing Technology and Science: . Paper presented at The 11th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2019), Sydney, Australia, 11–13 December 2019.
Öppna denna publikation i ny flik eller fönster >>Power Shepherd: Application Performance Aware Power Shifting
Visa övriga...
2019 (Engelska)Ingår i: the 11th IEEE International Conference on Cloud Computing Technology and Science, 2019Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Constantly growing power consumption of data centers is a major concern from environmental and economical reasons. Current approaches to reduce negative consequences of high power consumption focus on limiting the peak power consumption. During high workload periods, power consumption of highly utilized servers is throttled to stay within the power budget. However, the peak power reduction affects performance of hosted applications and thus leads to Quality of Service violations. In this paper, we introduce Power Shepherd, a hierarchical system for application performance aware power shifting.

Power Shepherd reduces the data center operational costs by redistributing the available power among applications hosted in the cluster. This is achieved by, assigning server power budgets by the cluster controller, enforcing these power budgets using Running Average Power Limit (RAPL), and prioritizing applications within each server by adjusting the CPU scheduling configuration. We implement a prototype of the proposed solution and evaluate it in a real testbed equipped with power meters and using representative cloud applications. Our experiments show that Power Shepherd has potential to manage a cluster consisting of thousands of servers and limit the increase of operational costs by a significant amount when the cluster power budget is limited and the system is overutilized. Finally, we identify some outstanding challenges regarding model sensitivity and the fact that this approach in its current from is not beneficial to be usedin all situations, e.g., when the system is underutilized

Nationell ämneskategori
Datorsystem
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-166125 (URN)
Konferens
The 11th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2019), Sydney, Australia, 11–13 December 2019
Tillgänglig från: 2019-12-12 Skapad: 2019-12-12 Senast uppdaterad: 2020-01-22
Krzywda, J., Ali-Eldin, A., Wadbro, E., Östberg, P.-O. & Elmroth, E. (2018). ALPACA: Application Performance Aware Server Power Capping. In: ICAC 2018: 2018 IEEE International Conference on Autonomic Computing (ICAC), Trento, Italy, September 3-7, 2018. Paper presented at 15th IEEE International Conference on Autonomic Computing (ICAC 2018) (pp. 41-50). IEEE Computer Society
Öppna denna publikation i ny flik eller fönster >>ALPACA: Application Performance Aware Server Power Capping
Visa övriga...
2018 (Engelska)Ingår i: ICAC 2018: 2018 IEEE International Conference on Autonomic Computing (ICAC), Trento, Italy, September 3-7, 2018, IEEE Computer Society, 2018, s. 41-50Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Server power capping limits the power consumption of a server to not exceed a specific power budget. This allows data center operators to reduce the peak power consumption at the cost of performance degradation of hosted applications. Previous work on server power capping rarely considers Quality-of-Service (QoS) requirements of consolidated services when enforcing the power budget. In this paper, we introduce ALPACA, a framework to reduce QoS violations and overall application performance degradation for consolidated services. ALPACA reduces unnecessary high power consumption when there is no performance gain, and divides the power among the running services in a way that reduces the overall QoS degradation when the power is scarce. We evaluate ALPACA using four applications: MediaWiki, SysBench, Sock Shop, and CloudSuite’s Web Search benchmark. Our experiments show that ALPACA reduces the operational costs of QoS penalties and electricity by up to 40% compared to a non optimized system. 

Ort, förlag, år, upplaga, sidor
IEEE Computer Society, 2018
Serie
IEEE Conference Publication, ISSN 2474-0756
Nyckelord
power capping, performance degradation, power-performance tradeoffs
Nationell ämneskategori
Datorsystem
Forskningsämne
administrativ databehandling
Identifikatorer
urn:nbn:se:umu:diva-132428 (URN)10.1109/ICAC.2018.00014 (DOI)978-1-5386-5139-1 (ISBN)
Konferens
15th IEEE International Conference on Autonomic Computing (ICAC 2018)
Tillgänglig från: 2017-03-13 Skapad: 2017-03-13 Senast uppdaterad: 2019-08-07Bibliografiskt granskad
Le Duc, T. & Östberg, P.-O. (2018). Application, Workload, and Infrastructure Models for Virtualized Content Delivery Networks Deployed in Edge Computing Environments. In: 2018 27th International Conference on Computer Communication and Networks (ICCCN): . Paper presented at The 27th International Conference on Computer Communications and Networks (ICCCN 2018), July 30 – August 2, 2018, Hangzhou, China. IEEE
Öppna denna publikation i ny flik eller fönster >>Application, Workload, and Infrastructure Models for Virtualized Content Delivery Networks Deployed in Edge Computing Environments
2018 (Engelska)Ingår i: 2018 27th International Conference on Computer Communication and Networks (ICCCN), IEEE, 2018Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Content Delivery Networks (CDNs) are handling a large part of the traffic over the Internet and are of growing importance for management and operation of coming generations of data intensive applications. This paper addresses modeling and scaling of content-oriented applications, and presents workload, application, and infrastructure models developed in collaboration with a large-scale CDN operating infrastructure provider aimed to improve the performance of content delivery subsystems deployed in wide area networks. It has been shown that leveraging edge resources for the deployment of caches of content greatly benefits CDNs. Therefore, the models are described from an edge computing perspective and intended to be integrated in network topology aware application orchestration and resource management systems.

Ort, förlag, år, upplaga, sidor
IEEE, 2018
Nyckelord
cloud computing, edge computing, edge-cloud, modeling, workload analysis
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-154282 (URN)10.1109/ICCCN.2018.8487450 (DOI)000450116600130 ()978-1-5386-5156-8 (ISBN)978-1-5386-5157-5 (ISBN)
Konferens
The 27th International Conference on Computer Communications and Networks (ICCCN 2018), July 30 – August 2, 2018, Hangzhou, China
Projekt
RECAP
Forskningsfinansiär
EU, Horisont 2020, 732667
Tillgänglig från: 2018-12-14 Skapad: 2018-12-14 Senast uppdaterad: 2018-12-17Bibliografiskt granskad
Krzywda, J., Ali-Eldin, A., Carlson, T. E., Östberg, P.-O. & Elmroth, E. (2018). Power-performance tradeoffs in data center servers: DVFS, CPUpinning, horizontal, and vertical scaling. Future generations computer systems, 81, 114-128
Öppna denna publikation i ny flik eller fönster >>Power-performance tradeoffs in data center servers: DVFS, CPUpinning, horizontal, and vertical scaling
Visa övriga...
2018 (Engelska)Ingår i: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 81, s. 114-128Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Dynamic Voltage and Frequency Scaling (DVFS), CPU pinning, horizontal, and vertical scaling, are four techniques that have been proposed as actuators to control the performance and energy consumption on data center servers. This work investigates the utility of these four actuators, and quantifies the power-performance tradeoffs associated with them. Using replicas of the German Wikipedia running on our local testbed, we perform a set of experiments to quantify the influence of DVFS, vertical and horizontal scaling, and CPU pinning on end-to-end response time (average and tail), throughput, and power consumption with different workloads. Results of the experiments show that DVFS rarely reduces the power consumption of underloaded servers by more than 5%, but it can be used to limit the maximal power consumption of a saturated server by up to 20% (at a cost of performance degradation). CPU pinning reduces the power consumption of underloaded server (by up to 7%) at the cost of performance degradation, which can be limited by choosing an appropriate CPU pinning scheme. Horizontal and vertical scaling improves both the average and tail response time, but the improvement is not proportional to the amount of resources added. The load balancing strategy has a big impact on the tail response time of horizontally scaled applications.

Nyckelord
Power-performance tradeoffs, Dynamic Voltage and Frequency Scaling (DVFS), CPU pinning, Horizontal scaling, Vertical scaling
Nationell ämneskategori
Datorsystem
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-132427 (URN)10.1016/j.future.2017.10.044 (DOI)000423652200010 ()2-s2.0-85033772481 (Scopus ID)
Anmärkning

Originally published in thesis in manuscript form.

Tillgänglig från: 2017-03-13 Skapad: 2017-03-13 Senast uppdaterad: 2019-07-02Bibliografiskt granskad
Gonzalo P., R., Elmroth, E., Östberg, P.-O. & Ramakrishnan, L. (2018). ScSF: a scheduling simulation framework. In: Proceedings of the 21th Workshop on Job Scheduling Strategies for Parallel Processing: . Paper presented at 21th Workshop on Job Scheduling Strategies for Parallel Processing (JSSP 2017), Orlando FL, USA, June 2nd, 2017 (pp. 152-173). Springer, 10773
Öppna denna publikation i ny flik eller fönster >>ScSF: a scheduling simulation framework
2018 (Engelska)Ingår i: Proceedings of the 21th Workshop on Job Scheduling Strategies for Parallel Processing, Springer, 2018, Vol. 10773, s. 152-173Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

High-throughput and data-intensive applications are increasingly present, often composed as workflows, in the workloads of current HPC systems. At the same time, trends for future HPC systems point towards more heterogeneous systems with deeper I/O and memory hierarchies. However, current HPC schedulers are designed to support classical large tightly coupled parallel jobs over homogeneous systems. Therefore, There is an urgent need to investigate new scheduling algorithms that can manage the future workloads on HPC systems. However, there is a lack of appropriate models and frameworks to enable development, testing, and validation of new scheduling ideas.

In this paper, we present an open-source scheduler simulation framework (ScSF) that covers all the steps of scheduling research through simulation. ScSF provides capabilities for workload modeling, workload generation, system simulation, comparative workload analysis, and experiment orchestration. The simulator is designed to be run over a distributed computing infrastructure enabling to test at scale. We describe in detail a use case of ScSF to develop new techniques to manage scientific workflows in a batch scheduler. In the use case, such technique was implemented in the framework scheduler. For evaluation purposes, 1728 experiments, equivalent to 33 years of simulated time, were run in a deployment of ScSF over a distributed infrastructure of 17 compute nodes during two months. Finally, the experimental results were analyzed in the framework to judge that the technique minimizes workflows’ turnaround time without over-allocating resources. Finally, we discuss lessons learned from our experiences that will help future researchers.

Ort, förlag, år, upplaga, sidor
Springer, 2018
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349
Nyckelord
slurm, simulation, scheduling, HPC, High Performance Computing, workload, generation, analysis
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-132981 (URN)10.1007/978-3-319-77398-8_9 (DOI)000444863700009 ()978-3-319-77397-1 (ISBN)978-3-319-77398-8 (ISBN)
Konferens
21th Workshop on Job Scheduling Strategies for Parallel Processing (JSSP 2017), Orlando FL, USA, June 2nd, 2017
Forskningsfinansiär
eSSENCE - An eScience CollaborationVetenskapsrådet, C0590801
Anmärkning

Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.

Tillgänglig från: 2017-03-27 Skapad: 2017-03-27 Senast uppdaterad: 2018-10-05Bibliografiskt granskad
Rodrigo, G. P., Östberg, P.-O., Elmroth, E., Antypas, K., Gerber, R. & Ramakrishnan, L. (2018). Towards understanding HPC users and systems: a NERSC case study. Journal of Parallel and Distributed Computing, 111, 206-221
Öppna denna publikation i ny flik eller fönster >>Towards understanding HPC users and systems: a NERSC case study
Visa övriga...
2018 (Engelska)Ingår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 111, s. 206-221Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

High performance computing (HPC) scheduling landscape currently faces new challenges due to the changes in the workload. Previously, HPC centers were dominated by tightly coupled MPI jobs. HPC workloads increasingly include high-throughput, data-intensive, and stream-processing applications. As a consequence, workloads are becoming more diverse at both application and job levels, posing new challenges to classical HPC schedulers. There is a need to understand the current HPC workloads and their evolution to facilitate informed future scheduling research and enable efficient scheduling in future HPC systems.

In this paper, we present a methodology to characterize workloads and assess their heterogeneity, at a particular time period and its evolution over time. We apply this methodology to the workloads of three systems (Hopper, Edison, and Carver) at the National Energy Research Scientific Computing Center (NERSC). We present the resulting characterization of jobs, queues, heterogeneity, and performance that includes detailed information of a year of workload (2014) and evolution through the systems' lifetime (2010–2014).

Ort, förlag, år, upplaga, sidor
Elsevier, 2018
Nyckelord
Workload analysis, Supercomputer, HPC, Scheduling, NERSC, Heterogeneity, k-means
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
administrativ databehandling
Identifikatorer
urn:nbn:se:umu:diva-132980 (URN)10.1016/j.jpdc.2017.09.002 (DOI)000415028900017 ()
Forskningsfinansiär
eSSENCE - An eScience CollaborationEU, Horisont 2020, 610711EU, Horisont 2020, 732667Vetenskapsrådet, C0590801
Anmärkning

Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.

Originally included in thesis in manuscript form in 2017.

Tillgänglig från: 2017-03-27 Skapad: 2017-03-27 Senast uppdaterad: 2018-06-25Bibliografiskt granskad
Byrne, J., Svorobej, S., Giannoutakis, K. M., Tzovaras, D., Byrne, P. J., Östberg, P.-O., . . . Lynn, T. (2017). A review of cloud computing simulation platforms and related environments. In: Proceedings of the 7th International Conference on Cloud Computing and Services Science: Volume 1: CLOSER. Paper presented at CLOSER 2017, 7th International Conference on Cloud Computing and Services Science, Porto, Portugal, 24–26 April, 2017 (pp. 679-691). , 1
Öppna denna publikation i ny flik eller fönster >>A review of cloud computing simulation platforms and related environments
Visa övriga...
2017 (Engelska)Ingår i: Proceedings of the 7th International Conference on Cloud Computing and Services Science: Volume 1: CLOSER, 2017, Vol. 1, s. 679-691Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Recent years have seen an increasing trend towards the development of Discrete Event Simulation (DES) platforms to support cloud computing related decision making and research. The complexity of cloud environments is increasing with scale and heterogeneity posing a challenge for the efficient management of cloud applications and data centre resources. The increasing ubiquity of social media, mobile and cloud computing combined with the Internet of Things and emerging paradigms such as Edge and Fog Computing is exacerbating this complexity. Given the scale, complexity and commercial sensitivity of hyperscale computing environments, the opportunity for experimentation is limited and requires substantial investment of resources both in terms of time and effort. DES provides a low risk technique for providing decision support for complex hyperscale computing scenarios. In recent years, there has been a significant increase in the development and extension of tools to support DES for cloud computing resulting in a wide range of tools which vary in terms of their utility and features. Through a review and analysis of available literature, this paper provides an overview and multi-level feature analysis of 33 DES tools for cloud computing environments. This review updates and extends existing reviews to include not only autonomous simulation platforms, but also on plugins and extensions for specific cloud computing use cases. This review identifies the emergence of CloudSim as a de facto base platform for simulation research and shows a lack of tool support for distributed execution (parallel execution on distributed memory systems).

Nyckelord
Cloud computing, Cloud simulation tools, Data centre, Fog computing
Nationell ämneskategori
Datavetenskap (datalogi)
Forskningsämne
datalogi
Identifikatorer
urn:nbn:se:umu:diva-135844 (URN)10.5220/0006373006790691 (DOI)978-989-758-243-1 (ISBN)
Konferens
CLOSER 2017, 7th International Conference on Cloud Computing and Services Science, Porto, Portugal, 24–26 April, 2017
Tillgänglig från: 2017-06-07 Skapad: 2017-06-07 Senast uppdaterad: 2018-06-09Bibliografiskt granskad
Östberg, P.-O., Byrne, J., Casari, P., Eardley, P., Fernandez Anta, A., Forsman, J., . . . Domaschka, J. (2017). Reliable Capacity Provisioning for Distributed Cloud/Edge/Fog Computing Applications. In: 2017 EUROPEAN CONFERENCE ON NETWORKS AND COMMUNICATIONS (EUCNC): . Paper presented at European Conference on Networks and Communications (EuCNC), JUN 12-15, 2017, Oulu, FINLAND. IEEE
Öppna denna publikation i ny flik eller fönster >>Reliable Capacity Provisioning for Distributed Cloud/Edge/Fog Computing Applications
Visa övriga...
2017 (Engelska)Ingår i: 2017 EUROPEAN CONFERENCE ON NETWORKS AND COMMUNICATIONS (EUCNC), IEEE , 2017Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

The REliable CApacity Provisioning and enhanced remediation for distributed cloud applications (RECAP) project aims to advance cloud and edge computing technology, to develop mechanisms for reliable capacity provisioning, and to make application placement, infrastructure management, and capacity provisioning autonomous, predictable and optimized. This paper presents the RECAP vision for an integrated edge-cloud architecture, discusses the scientific foundation of the project, and outlines plans for toolsets for continuous data collection, application performance modeling, application and component auto-scaling and remediation, and deployment optimization. The paper also presents four use cases from complementing fields that will be used to showcase the advancements of RECAP.

Ort, förlag, år, upplaga, sidor
IEEE, 2017
Serie
European Conference on Networks and Communications, ISSN 2475-6490
Nyckelord
Cloud computing, capacity provisioning, application modeling, workload propagation, data collection, alytics, machine learning, simulation, optimization
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-145820 (URN)10.1109/EuCNC.2017.7980667 (DOI)000425922900028 ()978-1-5386-3873-6 (ISBN)
Konferens
European Conference on Networks and Communications (EuCNC), JUN 12-15, 2017, Oulu, FINLAND
Tillgänglig från: 2018-08-14 Skapad: 2018-08-14 Senast uppdaterad: 2018-08-14Bibliografiskt granskad
Rodrigo, G. P., Östberg, P.-O., Elmroth, E., Antypas, K., Gerber, R. & Ramakrishnan, L. (2016). Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study. In: 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID): . Paper presented at 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), MAY 16-19, 2016, Cartagena, COLOMBIA (pp. 521-526).
Öppna denna publikation i ny flik eller fönster >>Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study
Visa övriga...
2016 (Engelska)Ingår i: 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, s. 521-526Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

The high performance computing (HPC) scheduling landscape is changing. Increasingly, there are large scientific computations that include high-throughput, data-intensive, and stream-processing compute models. These jobs increase the workload heterogeneity, which presents challenges for classical tightly coupled MPI job oriented HPC schedulers. Thus, it is important to define new analyses methods to understand the heterogeneity of the workload, and its possible effect on the performance of current systems. In this paper, we present a methodology to assess the job heterogeneity in workloads and scheduling queues. We apply the method on the workloads of three current National Energy Research Scientific Computing Center (NERSC) systems in 2014. Finally, we present the results of such analysis, with an observation that heterogeneity might reduce predictability in the jobs' wait time.

Serie
IEEE-ACM International Symposium on Cluster Cloud and Grid Computing, ISSN 2376-4414
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-126538 (URN)10.1109/CCGrid.2016.32 (DOI)000382529800067 ()978-1-5090-2453-7 (ISBN)
Konferens
16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), MAY 16-19, 2016, Cartagena, COLOMBIA
Tillgänglig från: 2016-10-28 Skapad: 2016-10-10 Senast uppdaterad: 2018-06-09Bibliografiskt granskad
Organisationer

Sök vidare i DiVA

Visa alla publikationer