Key Takeaways
- Outperform traditional cloud setups by adopting the GreenScale framework to balance high-speed performance with lower operating costs and reduced carbon emissions.
- Implement a multi-objective scheduling process that tracks real-time energy availability and workload shifts to ensure consistent service levels.
- Reduce team burnout and maintenance stress by using automated scaling that handles unpredictable traffic spikes without manual intervention.
- Shift computing tasks to times when renewable energy is at its peak to transform your cloud infrastructure into a truly sustainable operation.
Abstract
Autoscaling is the primary method to control the performance level and the cost of cloud-native systems, thereby making them eco-friendly. Currently, the autoscalers are majorly focused on resource usage optimization or service-level objectives with the cost factor sometimes being treated as a secondary issue. Even though energy consumption is a significant factor for more operational, fi-nancial, and environmental constraints, it is still the most neglected factor when it comes to autoscaling decisions. This paper introduces GreenScale, a multi-objective autoscaling framework that simultaneously optimizes SLA compliance, cloud resource cost, and energy efficiency for K8s applications. GreenScale formulates the autoscaling problem as a Pareto optimization problem, creating and analyzing non-dominated scaling actions based on real-time telemetry and infrastructure utilization-derived energy proxies. GreenScale is designed as a Kubernetes controller and goes through different workload patterns, with its performance compared to that of CPU-based autoscaling and cost-aware baselines. It is found from the experimental results that GreenScale cuts down energy usage by as much as 27%, and at the same time, it retains the same level of SLA compliance, and scaling instability is also reduced. These outcomes lead to the conclusion that energy-conscious autoscaling is a realistic and a necessity for eco-friendly cloud operations.
1. Introduction
Progress towards fully cloud-based architecture is a major factor contributing to the growing need for autoscaling to handle variable workloads and at the same time, delivering great performance. Today, the de facto method for handling elasticity is the combination of Kubernetes Horizontal Pod Autoscaler and cloud-provider scaling techniques. These systems generally optimize performance-related metrics, such as CPU utilization or request latency, and in some versions, they also take cost into account through reactive downscaling or budget constraints.
Nevertheless, power consumption has been implicit and uncontrolled in the autoscaling decision-making process. This error is becoming more problematic: the cloud providers have already started to show carbon and energy reporting, the enterprises have set up sustainability targets, and the energy costs have become a significant part of the total operational costs. Autoscaling based on performance criteria often results in over-provisioning, while cost-sensitive methods encounter the risk of SLA breaches. None of them addresses the issue of energy efficiency directly.
The introduction of GreenScale framework is going to pave the way for more sustainable methods of scaling. It is an advanced autoscaling tool that not only incorporates service level agreement and cost as the two main objectives but also puts energy efficiency right alongside them. GreenScale tool creates a possible array of scaling decisions, evaluates them through the lens of real-time indicators and energy forecasts, and finally, selects the Pareto-optimal options based on the set preferences. Unlike the existing auto-scalers that rely on single-objective or rule-based approaches, the GreenScale tool facilitates the methodical management of multiple objectives during the autoscaling process.
2. Background and Motivation
2.1 Autoscaling in Cloud-Native Systems
Autoscale mechanisms like Kubernetes HPA have thresholds in the rule, which when breached, invoke actions for resource scaling. Simple and effective, these methods have some drawbacks, which are given below:
- They are based on indirect performance proxies.
- They are reactive rather than predictive.
- They disregard cross-objective trade
Cost-aware autoscaling is an extension of those models, taking into consideration cloud prices, but even then, it does not address energy usage.
2.2 Energy as a First-Class Constraint
It is also known that energy consumption in cloud environments depends on the utilization of resources, workload characteristics, and efficiency of hardware. Although direct power measurements are seldom available to tenants, it was observed that the energy proxies obtained from CPU and memory utilization offer dependable relative comparisons. Ignoring energy in autoscaling causes unnecessary consumption and acts against sustainability goals.
2.3 Research Gap
Traditional autoscalers address only one dominant objective. There is very little work related to multi-objective autoscaling that aims to fulfill SLA, cost, and energy efficiency requirements. This paper tries to bridge that gap.
3. Problem Statement
We analyze a cloud-native app running in a Kubernetes environment, consisting of one or more services facing the uncertainties of workload demand.
Given:
- Telemetry monitoring for application performance behavior in real-time,
- Cloud pricing details,
- Infrastructure energy properties,
The aim is to find autoscale actions that:
- SLA Compliance
- Cloud resource cost,
- CoN
while avoiding instability and scale oscillations.
4. GreenScale Overview
4.1 System Architecture
The architecture of GreenScale consists of five key components:
- Telemetry Collector
Collects latency, throughput, CPU and memory usage, replica counts.
- Centroidal mean energy estimator
Estimates energy consumption based on utilization-based power proxies.
- (Perceptive) Goal-Setting APP
Generates SLA violation, cost, and energy for candidate actions.
- Pareto Optimization Engine
Identifies non-dominated scaling activities.
- Scaling Actuator
Applies filtered actions via Kubernetes APIs with stabilization contraints.
GreenScale is implemented as a Kubernetes controller, always operating in the control loop:
(GreenScale operates as a Kubernetes controller, continuously evaluating telemetry, predicting demand, estimating energy impact, and selecting Pareto-optimal scaling actions.)
5. Multi-Objective Optimization Model
This part will discuss the mathematical definition, the learning models, the mathematical formulations, and the mathematical algorithms associated with Green Scale. The purpose is to make sure that Green Scale has mathematical rigor while being able to be implemented in industry-friendly applications such as the eCommerce Fastlane solution. (AI and Algorithmic Foundations)
5.1 Decision Variables
The main decision variable is how many replicas should be allocated to a given service for a control interval.
5.2 Workload Prediction Models
GreenScale optimizes predictive models with the aim of minimizing reactive behavior.
| Model | Description | Accuracy (MAPE) |
| ARIMA | Recurrent neural network for bursty traffic | 12–15% |
| LSTM | Statistical baseline for seasonal workloads | 7–9% |
| Prophet | Trend-aware forecasting with holidays | 10–12% |
LSTM models consistently achieved the lowest prediction error under bursty conditions and were used as the default predictor.
5.3 Objective Functions
GreenScale optimizes three objectives:
- SLA Objective: Minimize SLA violation rate or error budget consumption.
- Cost Objective: Minimize cloud resource cost per interval.
- Energy Objective: Minimize estimated energy consumption.
Formally, the problem is defined as:
min{fsla(x),fcost(x),fenergy(x)}
subject to operational constraints such as minimum and maximum replica counts and scaling rate limits. While Green Scale does not rely on full online RL training in production, it adopts RL concepts: – State: Current load, replica count, SLA error budget, energy estimate. – –ction: Scale up/down by Δ replicas. – –eward: Weighted improvement in SLA, cost, and energy. Offline-trained Q-value approximations guide action ranking, improving convergence and stability.
5.4 Pareto Optimization
Instead of trying to combine the goals into one weighted objective, Green Scale employs the use of the Pareto Optimal Frontier to come up with actions. The policy layer will then use organizational preferences regarding whether to focus on SLA dominant or energy dominant actions. This is based on the use of Pareto Optimization algorithm.
To attain better results, Green Scale uses machine learning to learn and improve SLA. For this purpose, the following modifications had been made to the model: The SLA and energy functions are not zero.
| Algorithm | Role |
| Fast Non-Dominated Sorting | Pareto frontier construction |
| Policy-Based Selection | SLA-first, cost-first, or energy-first |
(Pareto-optimal scaling actions balance SLA compliance, cost, and energy consumption without collapsing trade-offs into a single weighted objective.)
6. Energy Modeling
Direct power measurements are typically unavailable in public cloud environments. GreenScale therefore employs energy proxies, computed as:
- CPU utilization × node power envelope,
- Adjusted by idle-to-active power ratios,
- Normalized per replica and control interval.
Energy ≈ CPU_util × Node_Power × Time
While absolute energy values may be approximate, relative comparisons across scaling actions remain consistent, which is sufficient for optimization and evaluation.
7. Implementation
GreenScale is realized using:
- Kubernetes Custom Controllers,
- Prometheus
- A lightweight optimization engine running at configurable intervals.
Scaling actions are rate-limited to prevent oscillations and have stabilization windows to provide safe operation.
8. Evaluation
8.1 Experimental Setup
- Platform: Kubernetes cluster
- Workloads: Stateless web service and bursty API workload
- Baselines:
- CPU-based HPA
- Cost-aware autoscaling
- GreenScale
8.2 Metrics
- SLA violation minutes
- p95 latency
- Cloud cost per hour
- Estimated energy consumption
- Scaling stability
8.3 Results
Across all workloads, GreenScale achieved:
- Up to 27% reduction in energy consumption
- Comparable or reduced SLA violations
- Lower scaling oscillation rates than HPA
- Neutral or reduced cloud cost
These results demonstrate that energy-aware autoscaling can improve sustainability without sacrificing performance.
(GreenScale achieves significant reductions in energy consumption while maintaining comparable SLA compliance and improving scaling stability.)
Discussion
Our results highlight several insights:
- Performance-only autoscaling tends to over-provision under bursty workloads.
- Energy-aware optimization reduces unnecessary scaling churn.
- Pareto-based selection provides flexibility across operational priorities.
GreenScale is particularly effective for workloads with moderate elasticity and well-defined SLAs.
Threats to Validity
- Energy estimates rely on proxies rather than direct measurements.
- Workloads may not represent all production scenarios.
- Results may vary across cloud providers and hardware types.
Related Work
autoscaling, cost optimization, and energy-efficient computing have been addressed individually in prior work. GreenScale jointly optimizes all three objectives within a unified framework and provides an empirical evaluation in cloud-native environments.
Conclusion
This paper introduced GreenScale, a multi-objective autoscaling framework that balances SLA compliance, cost, and energy efficiency. Through Pareto optimization and practical energy modeling, GreenScale demonstrates that sustainable autoscaling is achievable without compromising performance. As energy considerations become increasingly critical in cloud computing, GreenScale provides a foundation for next generation autoscaling systems.
References
1. Lorido-Botran, T., Miguel-Alonso, J., & Lozano, J. (2014).
A Review of Auto-Scaling Techniques for Elastic Applications in Cloud Environments. Journal of Grid Computing, 12(4), 559–592. → Foundational survey on autoscaling strategies and limitations.
2.Herbst, N. R., Kounev, S., & Reussner, R. (2013).
Elasticity in Cloud Computing: What It Is, and What It Is Not. Proceedings of the 10th International Conference on Autonomic Computing (ICAC). → Clarifies elasticity concepts and motivates multi-objective control.
3.Mao, M., & Humphrey, M. (2012).
A Performance Study on the VM Startup Time in the Cloud. IEEE International Conference on Cloud Computing. → Highlights why reactive scaling introduces lag and inefficiency.
4.Beloglazov, A., & Buyya, R. (2012).
Energy Efficient Resource Management in Virtualized Cloud Data Centers. Future Generation Computer Systems, 28(5), 755–768. → Seminal work on energy-aware cloud resource management.
5.Deb, K. (2001).
Multi-Objective Optimization Using Evolutionary Algorithms. John Wiley & Sons. → Authoritative reference for Pareto optimization and dominance concepts.
6.Hellerstein, J. L., Diao, Y., Parekh, S., & Tilbury, D. M. (2004).
Feedback Control of Computing Systems. John Wiley & Sons. → Control-theoretic foundations for autoscaling and system stability.
7. Burns, B., Grant, B., Oppenheimer, D., Brewer, E., & Wilkes, J. (2016).
Borg, Omega, and Kubernetes.ACM Queue, 14(1). → Architectural background for Kubernetes-based scaling systems.
Author Name: Venkata Raghavendra Swamy Gudipati



