Autonomous resource management is essential for large-scale cloud data centers, where Machine Learning~(ML) enables intelligent decision-making. However, shifts in data patterns within operational streams pose significant challenges to sustaining model accuracy and system efficiency.
This paper proposes an auto-adaptive ML approach to mitigate the impact of data drift in cloud systems. A knowledge base of distinct time-series batches and corresponding ML models is constructed and clustered using the Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN) algorithm. When model performance degrades, the system uses Dynamic Time Warping (DTW) to retrieve matching hyperparameters from the knowledge base and apply them to the deployed model, optimizing inference accuracy on new data streams.
Experiments with two real-world cloud data traces -- representing both stable and highly fluctuating environments - demonstrate that the proposed approach maintains high model accuracy (over 89%) while minimizing retraining costs. Specifically, for the Wikipedia trace with frequent data drift, retraining overhead is reduced by 22.9% compared to drift detection-based retraining and by 97\% compared to incremental retraining. In stable environments, like the Google cluster trace, retraining costs decrease by 96.3% and 88.9%, respectively.