Umeå University's logo

umu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Enhancing machine learning performancein dynamic cloud environments with auto-adaptive models
Umeå University, Faculty of Science and Technology, Department of Computing Science. (ADSlab)ORCID iD: 0000-0002-9156-3364
Umeå University, Faculty of Science and Technology, Department of Computing Science.ORCID iD: 0000-0002-9842-7840
Umeå University, Faculty of Science and Technology, Department of Computing Science.ORCID iD: 0000-0002-2633-6798
2024 (English)In: The 15th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2024), 2024Conference paper, Published paper (Refereed)
Abstract [en]

Autonomous resource management is essential for large-scale cloud data centers, where Machine Learning~(ML) enables intelligent decision-making. However, shifts in data patterns within operational streams pose significant challenges to sustaining model accuracy and system efficiency.

This paper proposes an auto-adaptive ML approach to mitigate the impact of data drift in cloud systems. A knowledge base of distinct time-series batches and corresponding ML models is constructed and clustered using the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) algorithm. When model performance degrades, the system uses Dynamic Time Warping (DTW) to retrieve matching hyperparameters from the knowledge base and apply them to the deployed model, optimizing inference accuracy on new data streams.

Experiments with two real-world cloud data traces -- representing both stable and highly fluctuating environments - demonstrate that the proposed approach maintains high model accuracy (over 89%) while minimizing retraining costs. Specifically, for the Wikipedia trace with frequent data drift, retraining overhead is reduced by 22.9% compared to drift detection-based retraining and by 97\% compared to incremental retraining. In stable environments, like the Google cluster trace, retraining costs decrease by 96.3% and 88.9%, respectively.

Place, publisher, year, edition, pages
2024.
Keywords [en]
Autonomous Cloud, Self-Adaptation, Cloud Operational Data, Data Drift, Machine Learning, Retraining
National Category
Computer Systems Computer Sciences
Research subject
Computer Science; Computer Systems
Identifiers
URN: urn:nbn:se:umu:diva-231945OAI: oai:DiVA.org:umu-231945DiVA, id: diva2:1923286
Conference
The 15th IEEE International Conference on Cloud Computing Technology and Science (CloudCom 2024), Khalifa University, Abu Dabi, United Arab Emirates, 9-11 Dec, 2024
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Available from: 2024-12-21 Created: 2024-12-21 Last updated: 2025-01-02

Open Access in DiVA

No full text in DiVA

Other links

Conference Program

Authority records

Nguyen, ChanhBhuyan, MonowarElmroth, Erik

Search in DiVA

By author/editor
Nguyen, ChanhBhuyan, MonowarElmroth, Erik
By organisation
Department of Computing Science
Computer SystemsComputer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 56 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf