Performance problem diagnosis in cloud infrastructures
2016 (English)Licentiate thesis, comprehensive summary (Other academic)
Cloud datacenters comprise hundreds or thousands of disparate application services, each having stringent performance and availability requirements, sharing a finite set of heterogeneous hardware and software resources. The implication of such complex environment is that the occurrence of performance problems, such as slow application response and unplanned downtimes, has become a norm rather than exception resulting in decreased revenue, damaged reputation, and huge human-effort in diagnosis. Though causes can be as varied as application issues (e.g. bugs), machine-level failures (e.g. faulty server), and operator errors (e.g. mis-configurations), recent studies have attributed capacity-related issues, such as resource shortage and contention, as the cause of most performance problems on the Internet today. As cloud datacenters become increasingly autonomous there is need for automated performance diagnosis systems that can adapt their operation to reflect the changing workload and topology in the infrastructure. In particular, such systems should be able to detect anomalous performance events, uncover manifestations of capacity bottlenecks, localize actual root-cause(s), and possibly suggest or actuate corrections.
This thesis investigates approaches for diagnosing performance problems in cloud infrastructures. We present the outcome of an extensive survey of existing research contributions addressing performance diagnosis in diverse systems domains. We also present models and algorithms for detecting anomalies in real-time application performance and identification of anomalous datacenter resources based on operational metrics and spatial dependency across datacenter components. Empirical evaluations of our approaches shows how they can be used to improve end-user experience, service assurance and support root-cause analysis.
Place, publisher, year, edition, pages
Umeå: Department of Computing Science, Umeå University , 2016. , 28 p.
Report / UMINF, ISSN 0348-0542 ; 16.14
Systems Performance, Performance anomalies, Performance bottlenecks, Cloud infrastructures, Cloud Computing, Cloud Services, Cloud Computing Performance, Performance problems, Performance anomaly detection, Performance bottleneck identification, Performance Root-cause Analysis
Research subject Computer Systems; Computer Science
IdentifiersURN: urn:nbn:se:umu:diva-120287ISBN: 978-91-7601-500-1OAI: oai:DiVA.org:umu-120287DiVA: diva2:928037
2016-05-24, N430, Naturvetarhuset, Umeå University, Umeå, 10:00 (English)
Casale, Guilliano, Senior LecturerFodor, Viktoria, ProfessorBjörklund, Henrik, Associate Professor
Elmroth, Erik, Professor
ProjectsCloud Control (C0590801)
FunderSwedish Research Council, C0590801
List of papers