Abstract
The rapid expansion of cloud computing has led to an exponential increase in energy consumption within data centers, posing significant environmental and economic challenges. Effective resource provisioning relies heavily on accurate workload forecasting to enable proactive auto-scaling. However, modern cloud workloads exhibit extreme non-stationarity and complex temporal dependencies that overwhelm traditional methods like ARIMA or standard RNNs. While Transformers have shown promise in sequence modeling, their quadratic computational complexity hinders real-time deployment for long-term forecasting. In this paper, we propose EcoFormer, a resource-efficient time-series forecasting framework. EcoFormer introduces a Probabilistic Sparse Attention mechanism that selects only the most dominant queries, reducing complexity to . Furthermore, we incorporate a Green-Regularized Loss that explicitly penalizes over-provisioning during idle periods. We provide a theoretical bound on the approximation error of our sparse attention matrix. Extensive experiments on the Google Cluster Trace dataset demonstrate that EcoFormer reduces Mean Absolute Error (MAE) by 14.5\% compared to standard Transformers and achieves an estimated energy saving of 18.2\% in simulated auto-scaling scenarios.
