Performance problem diagnosis in cloud infrastructures
2016 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]
Cloud datacenters comprise hundreds or thousands of disparate application services, each having stringent performance and availability requirements, sharing a finite set of heterogeneous hardware and software resources. The implication of such complex environment is that the occurrence of performance problems, such as slow application response and unplanned downtimes, has become a norm rather than exception resulting in decreased revenue, damaged reputation, and huge human-effort in diagnosis. Though causes can be as varied as application issues (e.g. bugs), machine-level failures (e.g. faulty server), and operator errors (e.g. mis-configurations), recent studies have attributed capacity-related issues, such as resource shortage and contention, as the cause of most performance problems on the Internet today. As cloud datacenters become increasingly autonomous there is need for automated performance diagnosis systems that can adapt their operation to reflect the changing workload and topology in the infrastructure. In particular, such systems should be able to detect anomalous performance events, uncover manifestations of capacity bottlenecks, localize actual root-cause(s), and possibly suggest or actuate corrections.
This thesis investigates approaches for diagnosing performance problems in cloud infrastructures. We present the outcome of an extensive survey of existing research contributions addressing performance diagnosis in diverse systems domains. We also present models and algorithms for detecting anomalies in real-time application performance and identification of anomalous datacenter resources based on operational metrics and spatial dependency across datacenter components. Empirical evaluations of our approaches shows how they can be used to improve end-user experience, service assurance and support root-cause analysis.
Place, publisher, year, edition, pages
Umeå: Department of Computing Science, Umeå University , 2016. , p. 28
Series
Report / UMINF, ISSN 0348-0542 ; 16.14
Keywords [en]
Systems Performance, Performance anomalies, Performance bottlenecks, Cloud infrastructures, Cloud Computing, Cloud Services, Cloud Computing Performance, Performance problems, Performance anomaly detection, Performance bottleneck identification, Performance Root-cause Analysis
National Category
Computer Systems
Research subject
Computer Systems; Computer Science
Identifiers
URN: urn:nbn:se:umu:diva-120287ISBN: 978-91-7601-500-1 (print)OAI: oai:DiVA.org:umu-120287DiVA, id: diva2:928037
Presentation
2016-05-24, N430, Naturvetarhuset, Umeå University, Umeå, 10:00 (English)
Opponent
Supervisors
Projects
Cloud Control (C0590801)
Funder
Swedish Research Council, C05908012016-05-232016-05-132021-03-18Bibliographically approved
List of papers