Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Existing problems in cloud data centers include hardware failures, unexpected peaks of incoming requests, or waste of energy due to low utilization and lack of energy proportionality, which all lead to resource shortages and as a result, application problems such as delays or crashes. A paradigm called Brownout has been designed to counteract these problems by automatically activating or deactivating optional computations in cloud applications. When optional computations are deactivated, the capacity requirement is decreased, which enables low enough response times to obtain responsive applications. Brownout has shown to successfully avoid overloads, however response times are often unstable and they sometimes present spikes due to sudden changes in the workload. This master thesis project is a contribution to the existing Brownout paradigm, to improve it. The goal is to find a way to stabilize response time around a certain set-point by taking the number of pending requests into consideration.
We designed and implemented new algorithms to improve Brownout and we produced experimental results based on the popular web application benchmark RUBiS. The RUBiS application was modified and deployed on a virtual machine in a Xen environment, and it received requests from emulated clients through a proxy. On this proxy, we first implemented a controller to set a threshold determining if optional computations shall be activated or not in the RUBiS application. Then we investigated machine learning algorithms using an offline training method to be able to set correct thresholds. As an offline training method is not desirable in real environments, we combined the controller and machine learning algorithms, such as using the outputs of the latter as controller feedforward, to obtain satisfying results.
Experimental results showed that the new Brownout algorithms can improve the initial Brownout by a factor up to 6. We determined this improvement by taking the 95th percentile response times into account, and comparing how far they are on average from a selected set-point. According to this improvement, determining if optional computations shall be activated or not based on queue-length of pending requests is a good approach to keep the response time stable around a set-point.
2015. , 69 p.