Umeå universitets logga

umu.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
When and How to Retrain Machine Learning-based Cloud Management Systems
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (Autonomous Distributed Systems)ORCID-id: 0000-0002-8097-1143
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (Autonomous Distributed Systems)
Intel Corporation, Germany.
Umeå universitet, Teknisk-naturvetenskapliga fakulteten, Institutionen för datavetenskap. (Autonomous Distributed Systems)ORCID-id: 0000-0002-2633-6798
2022 (Engelska)Ingår i: 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, 2022, s. 688-698Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Cloud management systems increasingly rely on machine learning (ML) models to predict incoming workload rates, load, and other system behaviors for efficient dynamic resource management. Current state-of-the-art prediction models demonstrate high accuracy, but assume that data patterns remain stable. However, in production use, systems may face hardware upgrades, changes in user behavior etc. that lead to concept drifts - significant changes in characteristics of data streams over time. To mitigate prediction deterioration, ML models need to be updated - but questions of when and how to best retrain these models are unsolved in the context of cloud management. We present a pilot study that address these questions for one of the most common models for adaptive prediction - Long Short Term Memory (LSTM) - using synthetic and real-world workload data. Our analysis of when to retrain explores approaches for detecting when retraining is required using both concept drift detection and prediction error thresholds, and at what point of retraining should actually take place. Our analysis of how to retrain focuses on the data required for retraining, and what proportion should be taken from before and after the need for retraining is detected. We present initial results that indicate that retraining of existing models can achieve prediction accuracy close to that of newly trained models but for much less cost, and present initial advice for how to provide cloud management systems with support for automatic retraining of ML-based methods.

Ort, förlag, år, upplaga, sidor
IEEE, 2022. s. 688-698
Nyckelord [en]
cloud computing, cloud workload prediction, concept drift, machine learning, time series prediction
Nationell ämneskategori
Datorsystem
Identifikatorer
URN: urn:nbn:se:umu:diva-198541DOI: 10.1109/IPDPSW55747.2022.00120ISI: 000855041000086Scopus ID: 2-s2.0-85136190866ISBN: 9781665497473 (digital)ISBN: 9781665497480 (tryckt)OAI: oai:DiVA.org:umu-198541DiVA, id: diva2:1686277
Konferens
2022 IEEE International Parallel and Distributed Processing Symposium, 30 May 2022-03 June 2022, Lyon, France
Forskningsfinansiär
Knut och Alice Wallenbergs StiftelseeSSENCE - An eScience CollaborationTillgänglig från: 2022-08-09 Skapad: 2022-08-09 Senast uppdaterad: 2025-05-07Bibliografiskt granskad
Ingår i avhandling
1. Accurate and low-overhead workload prediction for cloud management
Öppna denna publikation i ny flik eller fönster >>Accurate and low-overhead workload prediction for cloud management
2025 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Alternativ titel[sv]
Noggrann och effektiv prediktering av last för resurshantering i datormoln
Abstract [en]

Cloud computing has transformed the IT landscape by offering users and orga-nizations on-demand access to computing power, storage, data processing, andmachine learning resources. Despite the benefits, cloud resource managementfaces challenges due to the heterogeneous and dynamic nature of workloads.Inefficient provisioning manifests in two critical forms: underprovisioning leadsto degraded Quality of Service (QoS) and unmet Service-Level Agreements(SLAs), while overprovisioning results in unnecessary energy consumption andhigh operational costs. With the current rise of AI and machine learning in-novations, machine learning-based workload prediction for resource provisionplays a vital role in predicting future scenarios and identifying new occurrences,enabling service providers to prepare ahead of time. However, various challengesare associated with machine learning-based workload prediction.This thesis addresses the challenges of machine learning-based workloadprediction in cloud environments, including data drift due to dynamic workloads,high computational overhead, and storage overhead. Firstly, cloud workloads aredynamic, and models trained with old historical data can become obsolete overtime. We addressed the challenge of accurate prediction and data drift by incor-porating machine learning and streaming data processing algorithms to assistadaptive prediction. Secondly, constantly training and updating deep learningmodels adds significant computational overhead to the cloud infrastructure. Weaddressed this problem by proposing a solution that incorporates a knowledgebase repository with transfer learning-based adaptation. Moreover, we exploredthe tradeoff between model accuracy and computational overhead. Finally, wepropose a data compression mechanism that leverages an autoencoder to reducestorage overhead resulting from the continuous generation of monitoring datain cloud management systems.Our findings reveal that the proposed methods have significantly improvedthe machine learning-based cloud management system. Extensive evaluationusing real-world datasets reveals that the proposed methods facilitate thecreation of accurate predictions, even in the face of ever-changing patterns incloud workloads. Moreover, the methods reduced computation overhead byleveraging existing knowledge and highlighting the tradeoff required to achievea balance between prediction accuracy and computation overhead.

Ort, förlag, år, upplaga, sidor
Umeå: Umeå University, 2025. s. 38
Serie
Report / UMINF, ISSN 0348-0542 ; 25.09
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:umu:diva-238533 (URN)978-91-8070-713-8 (ISBN)978-91-8070-712-1 (ISBN)
Disputation
2025-05-30, MIT.A.121, MIT-huset, Umeå,, 13:15 (Engelska)
Opponent
Handledare
Tillgänglig från: 2025-05-09 Skapad: 2025-05-07 Senast uppdaterad: 2025-05-08Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Kidane, LidiaTownend, PaulElmroth, Erik

Sök vidare i DiVA

Av författaren/redaktören
Kidane, LidiaTownend, PaulElmroth, Erik
Av organisationen
Institutionen för datavetenskap
Datorsystem

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 720 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf