umu.sePublikasjoner
Endre søk
Link to record
Permanent link

Direct link
BETA
Östberg, Per-Olov
Alternativa namn
Publikasjoner (10 av 46) Visa alla publikasjoner
Le Duc, T., García Leiva, R., Casari, P. & Östberg, P.-O. (2019). Machine Learning Methods for Reliable Resource Provisioning in Edge-Cloud Computing: A Survey. ACM Computing Surveys, 52(5), Article ID 94.
Åpne denne publikasjonen i ny fane eller vindu >>Machine Learning Methods for Reliable Resource Provisioning in Edge-Cloud Computing: A Survey
2019 (engelsk)Inngår i: ACM Computing Surveys, ISSN 0360-0300, E-ISSN 1557-7341, Vol. 52, nr 5, artikkel-id 94Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Large-scale software systems are currently designed as distributed entities and deployed in cloud data centers. To overcome the limitations inherent to this type of deployment, applications are increasingly being supplemented with components instantiated closer to the edges of networks—a paradigm known as edge computing. The problem of how to efficiently orchestrate combined edge-cloud applications is, however, incompletely understood, and a wide range of techniques for resource and application management are currently in use.

This article investigates the problem of reliable resource provisioning in joint edge-cloud environments, and surveys technologies, mechanisms, and methods that can be used to improve the reliability of distributed applications in diverse and heterogeneous network environments. Due to the complexity of the problem, special emphasis is placed on solutions to the characterization, management, and control of complex distributed applications using machine learning approaches. The survey is structured around a decomposition of the reliable resource provisioning problem into three categories of techniques: workload characterization and prediction, component placement and system consolidation, and application elasticity and remediation. Survey results are presented along with a problem-oriented discussion of the state-of-the-art. A summary of identified challenges and an outline of future research directions are presented to conclude the article.

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM), 2019
Emneord
Reliability, cloud computing, edge computing, distributed systems, placement, consolidation, autoscaling, remediation, machine learning, optimization
HSV kategori
Forskningsprogram
datalogi; data- och systemvetenskap
Identifikatorer
urn:nbn:se:umu:diva-163331 (URN)10.1145/3341145 (DOI)2-s2.0-85072380854 (Scopus ID)
Tilgjengelig fra: 2019-09-16 Laget: 2019-09-16 Sist oppdatert: 2019-10-09bibliografisk kontrollert
Krzywda, J., Ali-Eldin, A., Wadbro, E., Östberg, P.-O. & Elmroth, E. (2019). Power Shepherd: Application Performance Aware Power Shifting. In: the 11th IEEE International Conference on Cloud Computing Technology and Science: . Paper presented at The 11th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2019), Sydney, Australia, 11–13 December 2019.
Åpne denne publikasjonen i ny fane eller vindu >>Power Shepherd: Application Performance Aware Power Shifting
Vise andre…
2019 (engelsk)Inngår i: the 11th IEEE International Conference on Cloud Computing Technology and Science, 2019Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Constantly growing power consumption of data centers is a major concern from environmental and economical reasons. Current approaches to reduce negative consequences of high power consumption focus on limiting the peak power consumption. During high workload periods, power consumption of highly utilized servers is throttled to stay within the power budget. However, the peak power reduction affects performance of hosted applications and thus leads to Quality of Service violations. In this paper, we introduce Power Shepherd, a hierarchical system for application performance aware power shifting.

Power Shepherd reduces the data center operational costs by redistributing the available power among applications hosted in the cluster. This is achieved by, assigning server power budgets by the cluster controller, enforcing these power budgets using Running Average Power Limit (RAPL), and prioritizing applications within each server by adjusting the CPU scheduling configuration. We implement a prototype of the proposed solution and evaluate it in a real testbed equipped with power meters and using representative cloud applications. Our experiments show that Power Shepherd has potential to manage a cluster consisting of thousands of servers and limit the increase of operational costs by a significant amount when the cluster power budget is limited and the system is overutilized. Finally, we identify some outstanding challenges regarding model sensitivity and the fact that this approach in its current from is not beneficial to be usedin all situations, e.g., when the system is underutilized

HSV kategori
Forskningsprogram
datalogi
Identifikatorer
urn:nbn:se:umu:diva-166125 (URN)
Konferanse
The 11th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2019), Sydney, Australia, 11–13 December 2019
Tilgjengelig fra: 2019-12-12 Laget: 2019-12-12 Sist oppdatert: 2020-01-22
Krzywda, J., Ali-Eldin, A., Wadbro, E., Östberg, P.-O. & Elmroth, E. (2018). ALPACA: Application Performance Aware Server Power Capping. In: ICAC 2018: 2018 IEEE International Conference on Autonomic Computing (ICAC), Trento, Italy, September 3-7, 2018. Paper presented at 15th IEEE International Conference on Autonomic Computing (ICAC 2018) (pp. 41-50). IEEE Computer Society
Åpne denne publikasjonen i ny fane eller vindu >>ALPACA: Application Performance Aware Server Power Capping
Vise andre…
2018 (engelsk)Inngår i: ICAC 2018: 2018 IEEE International Conference on Autonomic Computing (ICAC), Trento, Italy, September 3-7, 2018, IEEE Computer Society, 2018, s. 41-50Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Server power capping limits the power consumption of a server to not exceed a specific power budget. This allows data center operators to reduce the peak power consumption at the cost of performance degradation of hosted applications. Previous work on server power capping rarely considers Quality-of-Service (QoS) requirements of consolidated services when enforcing the power budget. In this paper, we introduce ALPACA, a framework to reduce QoS violations and overall application performance degradation for consolidated services. ALPACA reduces unnecessary high power consumption when there is no performance gain, and divides the power among the running services in a way that reduces the overall QoS degradation when the power is scarce. We evaluate ALPACA using four applications: MediaWiki, SysBench, Sock Shop, and CloudSuite’s Web Search benchmark. Our experiments show that ALPACA reduces the operational costs of QoS penalties and electricity by up to 40% compared to a non optimized system. 

sted, utgiver, år, opplag, sider
IEEE Computer Society, 2018
Serie
IEEE Conference Publication, ISSN 2474-0756
Emneord
power capping, performance degradation, power-performance tradeoffs
HSV kategori
Forskningsprogram
administrativ databehandling
Identifikatorer
urn:nbn:se:umu:diva-132428 (URN)10.1109/ICAC.2018.00014 (DOI)978-1-5386-5139-1 (ISBN)
Konferanse
15th IEEE International Conference on Autonomic Computing (ICAC 2018)
Tilgjengelig fra: 2017-03-13 Laget: 2017-03-13 Sist oppdatert: 2019-08-07bibliografisk kontrollert
Le Duc, T. & Östberg, P.-O. (2018). Application, Workload, and Infrastructure Models for Virtualized Content Delivery Networks Deployed in Edge Computing Environments. In: 2018 27th International Conference on Computer Communication and Networks (ICCCN): . Paper presented at The 27th International Conference on Computer Communications and Networks (ICCCN 2018), July 30 – August 2, 2018, Hangzhou, China. IEEE
Åpne denne publikasjonen i ny fane eller vindu >>Application, Workload, and Infrastructure Models for Virtualized Content Delivery Networks Deployed in Edge Computing Environments
2018 (engelsk)Inngår i: 2018 27th International Conference on Computer Communication and Networks (ICCCN), IEEE, 2018Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Content Delivery Networks (CDNs) are handling a large part of the traffic over the Internet and are of growing importance for management and operation of coming generations of data intensive applications. This paper addresses modeling and scaling of content-oriented applications, and presents workload, application, and infrastructure models developed in collaboration with a large-scale CDN operating infrastructure provider aimed to improve the performance of content delivery subsystems deployed in wide area networks. It has been shown that leveraging edge resources for the deployment of caches of content greatly benefits CDNs. Therefore, the models are described from an edge computing perspective and intended to be integrated in network topology aware application orchestration and resource management systems.

sted, utgiver, år, opplag, sider
IEEE, 2018
Emneord
cloud computing, edge computing, edge-cloud, modeling, workload analysis
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-154282 (URN)10.1109/ICCCN.2018.8487450 (DOI)000450116600130 ()978-1-5386-5156-8 (ISBN)978-1-5386-5157-5 (ISBN)
Konferanse
The 27th International Conference on Computer Communications and Networks (ICCCN 2018), July 30 – August 2, 2018, Hangzhou, China
Prosjekter
RECAP
Forskningsfinansiär
EU, Horizon 2020, 732667
Tilgjengelig fra: 2018-12-14 Laget: 2018-12-14 Sist oppdatert: 2018-12-17bibliografisk kontrollert
Krzywda, J., Ali-Eldin, A., Carlson, T. E., Östberg, P.-O. & Elmroth, E. (2018). Power-performance tradeoffs in data center servers: DVFS, CPUpinning, horizontal, and vertical scaling. Future generations computer systems, 81, 114-128
Åpne denne publikasjonen i ny fane eller vindu >>Power-performance tradeoffs in data center servers: DVFS, CPUpinning, horizontal, and vertical scaling
Vise andre…
2018 (engelsk)Inngår i: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 81, s. 114-128Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Dynamic Voltage and Frequency Scaling (DVFS), CPU pinning, horizontal, and vertical scaling, are four techniques that have been proposed as actuators to control the performance and energy consumption on data center servers. This work investigates the utility of these four actuators, and quantifies the power-performance tradeoffs associated with them. Using replicas of the German Wikipedia running on our local testbed, we perform a set of experiments to quantify the influence of DVFS, vertical and horizontal scaling, and CPU pinning on end-to-end response time (average and tail), throughput, and power consumption with different workloads. Results of the experiments show that DVFS rarely reduces the power consumption of underloaded servers by more than 5%, but it can be used to limit the maximal power consumption of a saturated server by up to 20% (at a cost of performance degradation). CPU pinning reduces the power consumption of underloaded server (by up to 7%) at the cost of performance degradation, which can be limited by choosing an appropriate CPU pinning scheme. Horizontal and vertical scaling improves both the average and tail response time, but the improvement is not proportional to the amount of resources added. The load balancing strategy has a big impact on the tail response time of horizontally scaled applications.

Emneord
Power-performance tradeoffs, Dynamic Voltage and Frequency Scaling (DVFS), CPU pinning, Horizontal scaling, Vertical scaling
HSV kategori
Forskningsprogram
datalogi
Identifikatorer
urn:nbn:se:umu:diva-132427 (URN)10.1016/j.future.2017.10.044 (DOI)000423652200010 ()2-s2.0-85033772481 (Scopus ID)
Merknad

Originally published in thesis in manuscript form.

Tilgjengelig fra: 2017-03-13 Laget: 2017-03-13 Sist oppdatert: 2019-07-02bibliografisk kontrollert
Gonzalo P., R., Elmroth, E., Östberg, P.-O. & Ramakrishnan, L. (2018). ScSF: a scheduling simulation framework. In: Proceedings of the 21th Workshop on Job Scheduling Strategies for Parallel Processing: . Paper presented at 21th Workshop on Job Scheduling Strategies for Parallel Processing (JSSP 2017), Orlando FL, USA, June 2nd, 2017 (pp. 152-173). Springer, 10773
Åpne denne publikasjonen i ny fane eller vindu >>ScSF: a scheduling simulation framework
2018 (engelsk)Inngår i: Proceedings of the 21th Workshop on Job Scheduling Strategies for Parallel Processing, Springer, 2018, Vol. 10773, s. 152-173Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

High-throughput and data-intensive applications are increasingly present, often composed as workflows, in the workloads of current HPC systems. At the same time, trends for future HPC systems point towards more heterogeneous systems with deeper I/O and memory hierarchies. However, current HPC schedulers are designed to support classical large tightly coupled parallel jobs over homogeneous systems. Therefore, There is an urgent need to investigate new scheduling algorithms that can manage the future workloads on HPC systems. However, there is a lack of appropriate models and frameworks to enable development, testing, and validation of new scheduling ideas.

In this paper, we present an open-source scheduler simulation framework (ScSF) that covers all the steps of scheduling research through simulation. ScSF provides capabilities for workload modeling, workload generation, system simulation, comparative workload analysis, and experiment orchestration. The simulator is designed to be run over a distributed computing infrastructure enabling to test at scale. We describe in detail a use case of ScSF to develop new techniques to manage scientific workflows in a batch scheduler. In the use case, such technique was implemented in the framework scheduler. For evaluation purposes, 1728 experiments, equivalent to 33 years of simulated time, were run in a deployment of ScSF over a distributed infrastructure of 17 compute nodes during two months. Finally, the experimental results were analyzed in the framework to judge that the technique minimizes workflows’ turnaround time without over-allocating resources. Finally, we discuss lessons learned from our experiences that will help future researchers.

sted, utgiver, år, opplag, sider
Springer, 2018
Serie
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349
Emneord
slurm, simulation, scheduling, HPC, High Performance Computing, workload, generation, analysis
HSV kategori
Forskningsprogram
datalogi
Identifikatorer
urn:nbn:se:umu:diva-132981 (URN)10.1007/978-3-319-77398-8_9 (DOI)000444863700009 ()978-3-319-77397-1 (ISBN)978-3-319-77398-8 (ISBN)
Konferanse
21th Workshop on Job Scheduling Strategies for Parallel Processing (JSSP 2017), Orlando FL, USA, June 2nd, 2017
Forskningsfinansiär
eSSENCE - An eScience CollaborationSwedish Research Council, C0590801
Merknad

Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.

Tilgjengelig fra: 2017-03-27 Laget: 2017-03-27 Sist oppdatert: 2018-10-05bibliografisk kontrollert
Rodrigo, G. P., Östberg, P.-O., Elmroth, E., Antypas, K., Gerber, R. & Ramakrishnan, L. (2018). Towards understanding HPC users and systems: a NERSC case study. Journal of Parallel and Distributed Computing, 111, 206-221
Åpne denne publikasjonen i ny fane eller vindu >>Towards understanding HPC users and systems: a NERSC case study
Vise andre…
2018 (engelsk)Inngår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 111, s. 206-221Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

High performance computing (HPC) scheduling landscape currently faces new challenges due to the changes in the workload. Previously, HPC centers were dominated by tightly coupled MPI jobs. HPC workloads increasingly include high-throughput, data-intensive, and stream-processing applications. As a consequence, workloads are becoming more diverse at both application and job levels, posing new challenges to classical HPC schedulers. There is a need to understand the current HPC workloads and their evolution to facilitate informed future scheduling research and enable efficient scheduling in future HPC systems.

In this paper, we present a methodology to characterize workloads and assess their heterogeneity, at a particular time period and its evolution over time. We apply this methodology to the workloads of three systems (Hopper, Edison, and Carver) at the National Energy Research Scientific Computing Center (NERSC). We present the resulting characterization of jobs, queues, heterogeneity, and performance that includes detailed information of a year of workload (2014) and evolution through the systems' lifetime (2010–2014).

sted, utgiver, år, opplag, sider
Elsevier, 2018
Emneord
Workload analysis, Supercomputer, HPC, Scheduling, NERSC, Heterogeneity, k-means
HSV kategori
Forskningsprogram
administrativ databehandling
Identifikatorer
urn:nbn:se:umu:diva-132980 (URN)10.1016/j.jpdc.2017.09.002 (DOI)000415028900017 ()
Forskningsfinansiär
eSSENCE - An eScience CollaborationEU, Horizon 2020, 610711EU, Horizon 2020, 732667Swedish Research Council, C0590801
Merknad

Work also supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research (ASCR) and we used resources at the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility, supported by the Officece of Science of the U.S. Department of Energy, both under Contract No. DE-AC02-05CH11231.

Originally included in thesis in manuscript form in 2017.

Tilgjengelig fra: 2017-03-27 Laget: 2017-03-27 Sist oppdatert: 2018-06-25bibliografisk kontrollert
Byrne, J., Svorobej, S., Giannoutakis, K. M., Tzovaras, D., Byrne, P. J., Östberg, P.-O., . . . Lynn, T. (2017). A review of cloud computing simulation platforms and related environments. In: Proceedings of the 7th International Conference on Cloud Computing and Services Science: Volume 1: CLOSER. Paper presented at CLOSER 2017, 7th International Conference on Cloud Computing and Services Science, Porto, Portugal, 24–26 April, 2017 (pp. 679-691). , 1
Åpne denne publikasjonen i ny fane eller vindu >>A review of cloud computing simulation platforms and related environments
Vise andre…
2017 (engelsk)Inngår i: Proceedings of the 7th International Conference on Cloud Computing and Services Science: Volume 1: CLOSER, 2017, Vol. 1, s. 679-691Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Recent years have seen an increasing trend towards the development of Discrete Event Simulation (DES) platforms to support cloud computing related decision making and research. The complexity of cloud environments is increasing with scale and heterogeneity posing a challenge for the efficient management of cloud applications and data centre resources. The increasing ubiquity of social media, mobile and cloud computing combined with the Internet of Things and emerging paradigms such as Edge and Fog Computing is exacerbating this complexity. Given the scale, complexity and commercial sensitivity of hyperscale computing environments, the opportunity for experimentation is limited and requires substantial investment of resources both in terms of time and effort. DES provides a low risk technique for providing decision support for complex hyperscale computing scenarios. In recent years, there has been a significant increase in the development and extension of tools to support DES for cloud computing resulting in a wide range of tools which vary in terms of their utility and features. Through a review and analysis of available literature, this paper provides an overview and multi-level feature analysis of 33 DES tools for cloud computing environments. This review updates and extends existing reviews to include not only autonomous simulation platforms, but also on plugins and extensions for specific cloud computing use cases. This review identifies the emergence of CloudSim as a de facto base platform for simulation research and shows a lack of tool support for distributed execution (parallel execution on distributed memory systems).

Emneord
Cloud computing, Cloud simulation tools, Data centre, Fog computing
HSV kategori
Forskningsprogram
datalogi
Identifikatorer
urn:nbn:se:umu:diva-135844 (URN)10.5220/0006373006790691 (DOI)978-989-758-243-1 (ISBN)
Konferanse
CLOSER 2017, 7th International Conference on Cloud Computing and Services Science, Porto, Portugal, 24–26 April, 2017
Tilgjengelig fra: 2017-06-07 Laget: 2017-06-07 Sist oppdatert: 2018-06-09bibliografisk kontrollert
Östberg, P.-O., Byrne, J., Casari, P., Eardley, P., Fernandez Anta, A., Forsman, J., . . . Domaschka, J. (2017). Reliable Capacity Provisioning for Distributed Cloud/Edge/Fog Computing Applications. In: 2017 EUROPEAN CONFERENCE ON NETWORKS AND COMMUNICATIONS (EUCNC): . Paper presented at European Conference on Networks and Communications (EuCNC), JUN 12-15, 2017, Oulu, FINLAND. IEEE
Åpne denne publikasjonen i ny fane eller vindu >>Reliable Capacity Provisioning for Distributed Cloud/Edge/Fog Computing Applications
Vise andre…
2017 (engelsk)Inngår i: 2017 EUROPEAN CONFERENCE ON NETWORKS AND COMMUNICATIONS (EUCNC), IEEE , 2017Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The REliable CApacity Provisioning and enhanced remediation for distributed cloud applications (RECAP) project aims to advance cloud and edge computing technology, to develop mechanisms for reliable capacity provisioning, and to make application placement, infrastructure management, and capacity provisioning autonomous, predictable and optimized. This paper presents the RECAP vision for an integrated edge-cloud architecture, discusses the scientific foundation of the project, and outlines plans for toolsets for continuous data collection, application performance modeling, application and component auto-scaling and remediation, and deployment optimization. The paper also presents four use cases from complementing fields that will be used to showcase the advancements of RECAP.

sted, utgiver, år, opplag, sider
IEEE, 2017
Serie
European Conference on Networks and Communications, ISSN 2475-6490
Emneord
Cloud computing, capacity provisioning, application modeling, workload propagation, data collection, alytics, machine learning, simulation, optimization
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-145820 (URN)10.1109/EuCNC.2017.7980667 (DOI)000425922900028 ()978-1-5386-3873-6 (ISBN)
Konferanse
European Conference on Networks and Communications (EuCNC), JUN 12-15, 2017, Oulu, FINLAND
Tilgjengelig fra: 2018-08-14 Laget: 2018-08-14 Sist oppdatert: 2018-08-14bibliografisk kontrollert
Rodrigo, G. P., Östberg, P.-O., Elmroth, E., Antypas, K., Gerber, R. & Ramakrishnan, L. (2016). Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study. In: 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID): . Paper presented at 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), MAY 16-19, 2016, Cartagena, COLOMBIA (pp. 521-526).
Åpne denne publikasjonen i ny fane eller vindu >>Towards Understanding Job Heterogeneity in HPC: A NERSC Case Study
Vise andre…
2016 (engelsk)Inngår i: 2016 16TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), 2016, s. 521-526Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The high performance computing (HPC) scheduling landscape is changing. Increasingly, there are large scientific computations that include high-throughput, data-intensive, and stream-processing compute models. These jobs increase the workload heterogeneity, which presents challenges for classical tightly coupled MPI job oriented HPC schedulers. Thus, it is important to define new analyses methods to understand the heterogeneity of the workload, and its possible effect on the performance of current systems. In this paper, we present a methodology to assess the job heterogeneity in workloads and scheduling queues. We apply the method on the workloads of three current National Energy Research Scientific Computing Center (NERSC) systems in 2014. Finally, we present the results of such analysis, with an observation that heterogeneity might reduce predictability in the jobs' wait time.

Serie
IEEE-ACM International Symposium on Cluster Cloud and Grid Computing, ISSN 2376-4414
HSV kategori
Identifikatorer
urn:nbn:se:umu:diva-126538 (URN)10.1109/CCGrid.2016.32 (DOI)000382529800067 ()978-1-5090-2453-7 (ISBN)
Konferanse
16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), MAY 16-19, 2016, Cartagena, COLOMBIA
Tilgjengelig fra: 2016-10-28 Laget: 2016-10-10 Sist oppdatert: 2018-06-09bibliografisk kontrollert
Organisasjoner