Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-6th-edition.csl
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
When and How to Retrain Machine Learning-based Cloud Management Systems
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Autonomous Distributed Systems)ORCID iD: 0000-0002-8097-1143
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Autonomous Distributed Systems)
Intel Corporation, Germany.
Umeå University, Faculty of Science and Technology, Department of Computing Science. (Autonomous Distributed Systems)ORCID iD: 0000-0002-2633-6798
2022 (English)In: 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), IEEE, 2022, p. 688-698Conference paper, Published paper (Refereed)
Abstract [en]

Cloud management systems increasingly rely on machine learning (ML) models to predict incoming workload rates, load, and other system behaviors for efficient dynamic resource management. Current state-of-the-art prediction models demonstrate high accuracy, but assume that data patterns remain stable. However, in production use, systems may face hardware upgrades, changes in user behavior etc. that lead to concept drifts - significant changes in characteristics of data streams over time. To mitigate prediction deterioration, ML models need to be updated - but questions of when and how to best retrain these models are unsolved in the context of cloud management. We present a pilot study that address these questions for one of the most common models for adaptive prediction - Long Short Term Memory (LSTM) - using synthetic and real-world workload data. Our analysis of when to retrain explores approaches for detecting when retraining is required using both concept drift detection and prediction error thresholds, and at what point of retraining should actually take place. Our analysis of how to retrain focuses on the data required for retraining, and what proportion should be taken from before and after the need for retraining is detected. We present initial results that indicate that retraining of existing models can achieve prediction accuracy close to that of newly trained models but for much less cost, and present initial advice for how to provide cloud management systems with support for automatic retraining of ML-based methods.

Place, publisher, year, edition, pages
IEEE, 2022. p. 688-698
Keywords [en]
cloud computing, cloud workload prediction, concept drift, machine learning, time series prediction
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:umu:diva-198541DOI: 10.1109/IPDPSW55747.2022.00120ISI: 000855041000086Scopus ID: 2-s2.0-85136190866ISBN: 9781665497473 (electronic)ISBN: 9781665497480 (print)OAI: oai:DiVA.org:umu-198541DiVA, id: diva2:1686277
Conference
2022 IEEE International Parallel and Distributed Processing Symposium, 30 May 2022-03 June 2022, Lyon, France
Funder
Knut and Alice Wallenberg FoundationeSSENCE - An eScience CollaborationAvailable from: 2022-08-09 Created: 2022-08-09 Last updated: 2023-09-05Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Kidane, LidiaTownend, PaulElmroth, Erik

Search in DiVA

By author/editor
Kidane, LidiaTownend, PaulElmroth, Erik
By organisation
Department of Computing Science
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 526 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • apa-6th-edition.csl
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf