umu.sePublications
Change search
Link to record
Permanent link

Direct link
BETA
Tomas, Luis
Alternative names
Publications (10 of 22) Show all publications
Souza, A., Papadopoulos, A. V., Tomás Bolivar, L., Gilbert, D. & Tordsson, J. (2018). Hybrid Adaptive Checkpointing for Virtual Machine Fault Tolerance. In: Li J., Chandra A., Guo T., Cai Y. (Ed.), Proceedings - 2018 IEEE International Conference on Cloud Engineering, IC2E 2018: . Paper presented at 2018 IEEE International Conference on Cloud Engineering (IC2E 2018), 17–20 April 2018, Orlando, Florida, USA (pp. 12-22). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Hybrid Adaptive Checkpointing for Virtual Machine Fault Tolerance
Show others...
2018 (English)In: Proceedings - 2018 IEEE International Conference on Cloud Engineering, IC2E 2018 / [ed] Li J., Chandra A., Guo T., Cai Y., Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 12-22Conference paper, Published paper (Refereed)
Abstract [en]

Active Virtual Machine (VM) replication is an application independent and cost-efficient mechanism for high availability and fault tolerance, with several recently proposed implementations based on checkpointing. However, these methods may suffer from large impacts on application latency, excessive resource usage overheads, and/or unpredictable behavior for varying workloads. To address these problems, we propose a hybrid approach through a Proportional-Integral (PI) controller to dynamically switch between periodic and on-demand check-pointing. Our mechanism automatically selects the method that minimizes application downtime by adapting itself to changes in workload characteristics. The implementation is based on modifications to QEMU, LibVirt, and OpenStack, to seamlessly provide fault tolerant VM provisioning and to enable the controller to dynamically select the best checkpointing mode. Our evaluation is based on experiments with a video streaming application, an e-commerce benchmark, and a software development tool. The experiments demonstrate that our adaptive hybrid approach improves both application availability and resource usage compared to static selection of a checkpointing method, with application performance gains and neglectable overheads.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018
Keywords
Fault Tolerance, Resource Management, Checkpoint, COLO, Control Theory
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:umu:diva-152033 (URN)10.1109/IC2E.2018.00023 (DOI)2-s2.0-85048315473 (Scopus ID)978-1-5386-5009-7 (ISBN)978-1-5386-5008-0 (ISBN)
Conference
2018 IEEE International Conference on Cloud Engineering (IC2E 2018), 17–20 April 2018, Orlando, Florida, USA
Available from: 2018-09-24 Created: 2018-09-24 Last updated: 2019-01-24Bibliographically approved
Kostentinos Tesfatsion, S., Proaño, J., Tomás, L., Caminero, B., Carrión, C. & Tordsson, J. (2018). Power and Performance Optimization in FPGA-accelerated Clouds. Concurrency and Computation, 30(18), Article ID e4526.
Open this publication in new window or tab >>Power and Performance Optimization in FPGA-accelerated Clouds
Show others...
2018 (English)In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 30, no 18, article id e4526Article in journal (Other academic) Published
Abstract [en]

Energy management has become increasingly necessary in data centers to address all energy-related costs, including capital costs, operating expenses, and environmental impacts. Heterogeneous systems with mixed hardware architectures provide both throughput and processing efficiency for different specialized application types and thus have a potential for significant energy savings. However, the presence of multiple and different processing elements increases the complexity of resource assignment. In this paper, we propose a system for efficient resource management in heterogeneous clouds. The proposed approach maps applications' requirement to different resources reducing power usage with minimum impact on performance. A technique that combines the scheduling of custom hardware accelerators, in our case, Field-Programmable Gate Arrays (FPGAs) and optimized resource allocation technique for commodity servers, is proposed. We consider an energy-aware scheduling technique that uses both the applications' performance and their deadlines to control the assignment of FPGAs to applications that would consume the most energy. Once the scheduler has performed the mapping between a VM and an FPGA, an optimizer handles the remaining VMs in the server, using vertical scaling and CPU frequency adaptation to reduce energy consumption while maintaining the required performance. Our evaluation using interactive and data-intensive applications compare the effectiveness of the proposed solution in energy savings as well as maintaining applications performance, obtaining up to a 32% improvement in the performance-energy ratio on a mix of multimedia and e-commerce applications.

Place, publisher, year, edition, pages
John Wiley & Sons, 2018
Keywords
cloud computing, energy efficiency, FPGA-aware
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-121092 (URN)10.1002/cpe.4526 (DOI)000442575600010 ()
Funder
Swedish Research Council
Available from: 2016-05-26 Created: 2016-05-26 Last updated: 2019-01-15Bibliographically approved
Goumas, G., Nikas, K., Lakew, E. B., Kotselidis, C., Attwood, A., Elmroth, E., . . . Koziris, N. (2017). ACTiCLOUD: Enabling the Next Generation of Cloud Applications. In: Lee, K Liu, L (Ed.), 2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017): . Paper presented at 37th IEEE International Conference on Distributed Computing Systems (ICDCS), JUN 05-08, 2017, Atlanta, GA (pp. 1836-1845). IEEE Computer Society
Open this publication in new window or tab >>ACTiCLOUD: Enabling the Next Generation of Cloud Applications
Show others...
2017 (English)In: 2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017) / [ed] Lee, K Liu, L, IEEE Computer Society, 2017, p. 1836-1845Conference paper, Published paper (Refereed)
Abstract [en]

Despite their proliferation as a dominant computing paradigm, cloud computing systems lack effective mechanisms to manage their vast amounts of resources efficiently. Resources are stranded and fragmented, ultimately limiting cloud systems' applicability to large classes of critical applications that pose non-moderate resource demands. Eliminating current technological barriers of actual fluidity and scalability of cloud resources is essential to strengthen cloud computing's role as a critical cornerstone for the digital economy. ACTiCLOUD proposes a novel cloud architecture that breaks the existing scale-up and share-nothing barriers and enables the holistic management of physical resources both at the local cloud site and at distributed levels. Specifically, it makes advancements in the cloud resource management stacks by extending state-of-the-art hypervisor technology beyond the physical server boundary and localized cloud management system to provide a holistic resource management within a rack, within a site, and across distributed cloud sites. On top of this, ACTiCLOUD will adapt and optimize system libraries and runtimes (e.g., JVM) as well as ACTiCLOUD-native applications, which are extremely demanding, and critical classes of applications that currently face severe difficulties in matching their resource requirements to state-of-the-art cloud offerings.

Place, publisher, year, edition, pages
IEEE Computer Society, 2017
Series
IEEE International Conference on Distributed Computing Systems, ISSN 1063-6927
Keywords
cloud computing, resource management, in-memory databases, resource disaggregation, scale-up, rackscale hypervisor
National Category
Computer Systems
Identifiers
urn:nbn:se:umu:diva-142014 (URN)10.1109/ICDCS.2017.252 (DOI)000412759500173 ()978-1-5386-1791-5 (ISBN)978-1-5386-1792-2 (ISBN)978-1-5386-1793-9 (ISBN)
Conference
37th IEEE International Conference on Distributed Computing Systems (ICDCS), JUN 05-08, 2017, Atlanta, GA
Available from: 2017-11-20 Created: 2017-11-20 Last updated: 2018-06-09Bibliographically approved
Lorido-Botran, T., Huerta, S., Tomás, L., Tordsson, J. & Sanz, B. (2017). An unsupervised approach to online noisy-neighbor detection in cloud data centers. Expert systems with applications, 89, 188-204
Open this publication in new window or tab >>An unsupervised approach to online noisy-neighbor detection in cloud data centers
Show others...
2017 (English)In: Expert systems with applications, ISSN 0957-4174, E-ISSN 1873-6793, Vol. 89, p. 188-204Article in journal (Refereed) Published
Abstract [en]

Resource sharing is an inherent characteristic of cloud data centers. Virtual Machines (VMs) and/or Containers that are co-located in the same physical server often compete for resources leading to interference. The noisy neighbor’s effect refers to an anomaly caused by a VM/container limiting resources accessed by another one. Our main contribution is an online, lightweight and application-agnostic solution for anomaly detection, that follows an unsupervised approach. It is based on comparing models for different lags: Dirichlet Process Gaussian Mixture Models to characterize the resource usage profile of the application, and distance measures to score the similarity among models. An alarm is raised when there is an abrupt change in short-term lag (i.e. high distance score for short-term models), while the long-term state remains constant. We test the algorithm for different cloud workloads: websites, periodic batch applications, Spark-based applications, and Memcached server. We are able to detect anomalies in the CPU and memory resource usage with up to 82–96% accuracy (recall) depending on the scenario. Compared to other baseline methods, our approach is able to detect anomalies successfully, while raising low number of false positives, even in the case of applications with unusual normal behavior (e.g. periodic). Experiments show that our proposed algorithm is a lightweight and effective solution to detect noisy neighbor effect without any historical info about the application, that could also be potentially applied to other kind of anomalies.

Place, publisher, year, edition, pages
Elsevier, 2017
Keywords
Anomaly detection, Virtual machine, Cloud computing, DPGMM, Noisy-neighbor effect, Similarity distances
National Category
Other Computer and Information Science
Identifiers
urn:nbn:se:umu:diva-138402 (URN)10.1016/j.eswa.2017.07.038 (DOI)000411420200016 ()
Available from: 2017-08-22 Created: 2017-08-22 Last updated: 2018-06-09Bibliographically approved
Tesfatsion, S. K., Tomás, L. & Tordsson, J. (2017). OptiBook: Optimal Resource Booking for Energy-efficient Datacenters. In: 2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS): . Paper presented at 2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS), Vilanova i la Geltrú, Spain, June 14-16, 2017. IEEE Communications Society
Open this publication in new window or tab >>OptiBook: Optimal Resource Booking for Energy-efficient Datacenters
2017 (English)In: 2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS), IEEE Communications Society, 2017Conference paper, Published paper (Refereed)
Abstract [en]

A lack of energy proportionality, low resource utilization, and interference in virtualized infrastructure make the cloud a challenging target environment for improving energy efficiency. In this paper we present OptiBook, a system that improves energy proportionality and/or resource utilization to optimize performance and energy efficiency. OptiBook shares servers between latency-sensitive services and batch jobs, over- books the system in a controllable manner, uses vertical (CPU and DVFS) scaling for prioritized virtual machines, and applies performance isolation techniques such as CPU pinning and quota enforcement as well as online resource tuning to effectively improve energy efficiency. Our evaluations show that on average, OptiBook improves performance per watt by 20% and reduces energy consumption by 9% while minimizing SLO violations. 

Place, publisher, year, edition, pages
IEEE Communications Society, 2017
National Category
Computer Systems
Identifiers
urn:nbn:se:umu:diva-145492 (URN)10.1109/IWQoS.2017.7969135 (DOI)000428199300029 ()978-1-5386-2704-4 (ISBN)978-1-5386-2705-1 (ISBN)
Conference
2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS), Vilanova i la Geltrú, Spain, June 14-16, 2017
Available from: 2018-03-07 Created: 2018-03-07 Last updated: 2018-06-09Bibliographically approved
Dürango, J., Tärneberg, W., Tomas, L., Tordsson, J., Kihl, M. & Maggio, M. (2016). A control theoretical approach to non-intrusive geo-replication for cloud services. In: 2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC): . Paper presented at 55th IEEE Conference on Decision and Control (CDC), Las Vegas, NV, DEC 12-14, 2016 (pp. 1649-1656). IEEE
Open this publication in new window or tab >>A control theoretical approach to non-intrusive geo-replication for cloud services
Show others...
2016 (English)In: 2016 IEEE 55TH CONFERENCE ON DECISION AND CONTROL (CDC), IEEE, 2016, p. 1649-1656Conference paper, Published paper (Refereed)
Abstract [en]

Complete data center failures may occur due to disastrous events such as earthquakes or fires. To attain robustness against such failures and reduce the probability of data loss, data must be replicated in another data center sufficiently geographically separated from the original data center. Implementing geo-replication is expensive as every data update operation in the original data center must be replicated in the backup. Running the application and the replication service in parallel is cost effective but creates a trade-off between potential replication consistency and data loss and reduced application performance due to network resource contention. We model this trade-off and provide a control-theoretical solution based on Model Predictive Control to dynamically allocate network bandwidth to accommodate the objectives of both replication and application data streams. We evaluate our control solution through simulations emulating the individual services, their traffic flows, and the shared network resource. The MPC solution is able to maintain a consistent performance over periods of persistent overload, and is quickly able to indiscriminately recover once the system return to a stable state. Additionally, the MPC balances the two objectives of consistency and performance according to the proportions specified in the objective function.

Place, publisher, year, edition, pages
IEEE, 2016
Series
IEEE Conference on Decision and Control, ISSN 0743-1546
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-124172 (URN)10.1109/CDC.2016.7798502 (DOI)000400048101133 ()978-1-5090-1837-6 (ISBN)
Conference
55th IEEE Conference on Decision and Control (CDC), Las Vegas, NV, DEC 12-14, 2016
Available from: 2016-07-25 Created: 2016-07-25 Last updated: 2018-06-07Bibliographically approved
Saeid Masoumzadeh, S., Hlavacs, H. & Tomas, L. (2016). A Self-Adaptive Performance-Aware Capacity Controller in Overbooked Datacenters. In: Gupta, I; Diao, Y (Ed.), 2016 INTERNATIONAL CONFERENCE ON CLOUD AND AUTONOMIC COMPUTING (ICCAC): . Paper presented at IEEE International Conference on Cloud and Autonomic Computing (ICCAC), SEP 12-16, 2016, Augsburg, GERMANY (pp. 12-23). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>A Self-Adaptive Performance-Aware Capacity Controller in Overbooked Datacenters
2016 (English)In: 2016 INTERNATIONAL CONFERENCE ON CLOUD AND AUTONOMIC COMPUTING (ICCAC) / [ed] Gupta, I; Diao, Y, Institute of Electrical and Electronics Engineers (IEEE), 2016, p. 12-23Conference paper, Published paper (Refereed)
Abstract [en]

Interference between co-located VMs may lead to performance fluctuations and degradation, especially in overbooked datacenters. To limit this problem, VMs access to physical resources needs to be controlled to ensure certain degree of isolation among them. However, the mapping between virtual and physical resources must be performed in a dynamic way so that it can be adapted to the changing applications requirements, as well as to the different set of co-located VMs. To address this problem we propose a twofold approach: (1) a Quality of Service (QoS) scheme that provides different isolation levels for VMs with different QoS requirements, and (2) a self-adaptive fuzzy Q-learning capacity controller that proactively readjusts the isolation degree based on applications performance. Our evaluation based on real cloud applications and workloads demonstrates that the efficient, adaptive mapping between VMs and physical resources reduces the interference between VMs, enabling the possibility of co-locating more VMs, increases overall utilization, and ensures the performance of critical applications while providing more resources to the low QoS applications.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2016
Keywords
Cloud Computing, Fuzzy Q-Learning, Overbooking, Pinning, QoS, VM interference
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-124171 (URN)10.1109/ICCAC.2016.8 (DOI)000390252000002 ()978-1-5090-3536-6 (ISBN)
Conference
IEEE International Conference on Cloud and Autonomic Computing (ICCAC), SEP 12-16, 2016, Augsburg, GERMANY
Available from: 2016-07-25 Created: 2016-07-25 Last updated: 2018-06-07Bibliographically approved
Proaño Orellana, J., Caminero, B., Carrión, C., Tomas, L., Kostentinos Tesfatsion, S. & Tordsson, J. (2016). FPGA-Aware Scheduling Strategies at Hypervisor Level in Cloud Environments. Scientific Programming, Article ID 4670271.
Open this publication in new window or tab >>FPGA-Aware Scheduling Strategies at Hypervisor Level in Cloud Environments
Show others...
2016 (English)In: Scientific Programming, ISSN 1058-9244, E-ISSN 1875-919X, article id 4670271Article in journal (Refereed) Published
Abstract [en]

Current open issues regarding cloud computing include the support for nontrivial Quality of Service-related Service Level Objectives (SLOs) and reducing the energy footprint of data centers. One strategy that can contribute to both is the integration of accelerators as specialized resources within the cloud system. In particular, Field Programmable Gate Arrays (FPGAs) exhibit an excellent performance/energy consumption ratio that can be harnessed to achieve these goals. In this paper, a multilevel cloud scheduling framework is described, and several FPGA-aware node level scheduling strategies (applied at the hypervisor level) are explored and analyzed. These strategies are based on the use of a multiobjective metric aimed at providing Quality of Service (QoS) support. Results show how the proposed FPGA-aware scheduling policies increment the number of users requests serviced with their SLOs fulfilled while energy consumption is minimized. In particular, evaluation results of a use case based on a multimedia application show that the proposal can save more than 20% of the total energy compared with other baseline algorithms while a higher percentage of Service Level Agreement (SLA) is fulfilled.

National Category
Computer Sciences
Research subject
business data processing
Identifiers
urn:nbn:se:umu:diva-124168 (URN)10.1155/2016/4670271 (DOI)000379470300001 ()
Available from: 2016-07-25 Created: 2016-07-25 Last updated: 2018-06-07Bibliographically approved
Tomas, L., Masoumzadeh, S. S. & Hlavacs, H. (2016). Self-Adaptive Capacity Controller: A Reinforcement Learning Approach. In: 2016 IEEE INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING (ICAC): . Paper presented at 13th IEEE International Conference on Autonomic Computing (ICAC), JUL 17-22, 2016, Wurzburg, GERMANY (pp. 233-234). LOS ALAMITOS: IEEE Computer Society
Open this publication in new window or tab >>Self-Adaptive Capacity Controller: A Reinforcement Learning Approach
2016 (English)In: 2016 IEEE INTERNATIONAL CONFERENCE ON AUTONOMIC COMPUTING (ICAC), LOS ALAMITOS: IEEE Computer Society, 2016, p. 233-234Conference paper, Published paper (Refereed)
Abstract [en]

Interference between co-located VMs may lead to performance fluctuations and degradation. To limit this problem, VMs access to physical resources needs to be controlled to ensure certain degree of isolation among them. This mapping between virtual and physical resources must be performed in a dynamic way so that it can be adaptive to the changing applications requirements, as well as to the different set of co-located VMs. To address this problem we propose a self-adaptive fuzzy Q-learning capacity controller that proactively readjusts the isolation degree based on applications performance. Our evaluation demonstrates a reduction into VMs interference and an increment on the overall utilization, while still ensuring critical applications performance, and providing more resources to non-critical applications.

Place, publisher, year, edition, pages
LOS ALAMITOS: IEEE Computer Society, 2016
Series
Proceedings of the International Conference on Autonomic Computing, ISSN 2474-0756
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-130487 (URN)10.1109/ICAC.2016.51 (DOI)000390681200035 ()978-1-5090-1653-2 (ISBN)
Conference
13th IEEE International Conference on Autonomic Computing (ICAC), JUL 17-22, 2016, Wurzburg, GERMANY
Available from: 2017-01-20 Created: 2017-01-20 Last updated: 2018-06-09Bibliographically approved
Tomas, L., Saeid Masoumzadeh, S. & Hlavacs, H. (2016). Self-Adaptive Capacity Controller: A Reinforcement Learning Approach. In: 2016 IEEE International Conference on Autonomic Computing (ICAC): . Paper presented at ICAC 2016: 13th IEEE International Conference on Autonomic Computing, Wuerzburg,. IEEE
Open this publication in new window or tab >>Self-Adaptive Capacity Controller: A Reinforcement Learning Approach
2016 (English)In: 2016 IEEE International Conference on Autonomic Computing (ICAC), IEEE, 2016Conference paper, Published paper (Refereed)
Abstract [en]

Interference between co-located VMs may lead to performance fluctuations and degradation. To limit this problem, VMs access to physical resources needs to be controlled to ensure certain degree of isolation among them. This mapping between virtual and physical resources must be performed in a dynamic way so that it can be adaptive to the changing applications requirements, as well as to the different set of co-located VMs. To address this problem we propose a self-adaptive fuzzy Q-learning capacity controller that proactively readjusts the isolation degree based on applications performance. Our evaluation demonstrates a reduction into VMs interference and an increment on the overall utilization, while still ensuring critical applications performance, and providing more resources to non-critical applications.

Place, publisher, year, edition, pages
IEEE, 2016
National Category
Computer Sciences
Identifiers
urn:nbn:se:umu:diva-124170 (URN)10.1109/ICAC.2016.51 (DOI)978-1-5090-1654-9 (ISBN)
Conference
ICAC 2016: 13th IEEE International Conference on Autonomic Computing, Wuerzburg,
Available from: 2016-07-25 Created: 2016-07-25 Last updated: 2019-06-19Bibliographically approved
Organisations

Search in DiVA

Show all publications