Kubernetes Cost Optimization: Reduce Cloud Spend by 40%
In this article
Kubernetes makes it easy to deploy applications but equally easy to overspend. We have helped clients reduce their Kubernetes costs by 40-60% through systematic optimization. This guide shares the strategies that deliver the biggest impact, with actual Terraform and Helm configurations you can use today.
Right-Sizing Workloads
Over-provisioning is the single largest source of Kubernetes waste. Most teams request 3-5x more CPU and memory than their workloads actually use. Here is how to fix it:
Step 1: Measure Actual Usage
# Quick check: Compare requested vs actual usage
kubectl top pods -n production --sort-by=cpu
# More detailed: Export metrics for analysis
kubectl get pods -n production -o json | \
jq '.items[] | {
name: .metadata.name,
cpu_request: .spec.containers[0].resources.requests.cpu,
mem_request: .spec.containers[0].resources.requests.memory
}'
# Install metrics-server if not already present
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Step 2: Install Vertical Pod Autoscaler (VPA)
VPA analyzes historical resource usage and recommends optimal CPU/memory settings:
# vpa-recommendation.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-server-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
updatePolicy:
updateMode: "Off" # Start with "Off" to get recommendations only
resourcePolicy:
containerPolicies:
- containerName: api-server
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2000m
memory: 4Gi
Key Takeaway
Start VPA in recommendation mode ("Off"), not auto-update mode. Review its suggestions for 1-2 weeks before letting it auto-adjust. At CodingAlphas, we found VPA's recommendations reduced resource requests by 60% on average while keeping P99 latency stable.
Autoscaling Strategies
Scale with demand instead of provisioning for peak capacity:
# hpa-advanced.yaml - Scale on multiple metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-server-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2
maxReplicas: 20
behavior:
scaleDown:
stabilizationWindowSeconds: 300 # Prevent thrashing
policies:
- type: Percent
value: 25 # Scale down max 25% at a time
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Percent
value: 100 # Scale up aggressively
periodSeconds: 30
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
KEDA for Event-Driven Scaling
For workloads driven by message queues, KEDA scales pods based on queue depth rather than CPU:
# keda-scaledobject.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: worker-scaler
spec:
scaleTargetRef:
name: background-worker
minReplicaCount: 0 # Scale to zero when idle!
maxReplicaCount: 50
triggers:
- type: rabbitmq
metadata:
queueName: tasks
host: amqp://rabbitmq.default.svc.cluster.local
queueLength: "5" # 1 pod per 5 queue messages
Kubecost Dashboard and Autoscaling in Action
Using Kubecost for cost visibility alongside KEDA for event-driven scaling, we helped clients reduce idle resource costs by 45% in the first quarter while maintaining P99 latency SLAs through intelligent autoscaling policies.
Spot and Preemptible Instances
Spot instances (AWS), preemptible VMs (GCP), and spot VMs (Azure) offer 60-80% discounts for interruptible compute. The key is using them safely:
Rules for Spot Instance Usage
- Stateless workloads only: Web servers, API gateways, workers — anything that can restart without data loss.
- Multiple instance types: Diversify across instance types and availability zones to reduce interruption risk.
- Graceful termination: Handle the 2-minute termination notice (AWS) to drain connections and finish in-flight work.
- Mixed node pools: Keep a baseline of on-demand instances for critical workloads; burst onto spot for variable load.
Cloud Provider Comparison: EKS vs GKE vs AKS
Kubernetes itself is the same everywhere, but the managed service experience and pricing differ significantly:
| Feature | AWS EKS | GCP GKE | Azure AKS |
|---|---|---|---|
| Control plane cost | $73/mo per cluster | Free (standard) | Free (standard) |
| Autopilot/serverless | Fargate (per pod) | GKE Autopilot | Virtual nodes |
| Spot discount | Up to 90% | 60-91% | Up to 90% |
| Built-in cost tools | Cost Explorer | GKE cost allocation | Cost Management |
| Best for | Large enterprise, AWS-heavy | K8s-native teams, cost | Microsoft stack, hybrid |
Key Takeaway
GKE Autopilot offers the best cost efficiency for most workloads — you pay only for pod resources with no node management overhead. For AWS shops, Karpenter (not Cluster Autoscaler) is the modern choice for node provisioning.
Terraform and Helm: Infrastructure as Code
Codify your cost optimization policies so they persist across cluster recreations:
# terraform/eks-optimized.tf
module "eks" {
source = "terraform-aws-modules/eks/aws"
version = "~> 20.0"
cluster_name = "production"
cluster_version = "1.29"
# Cost optimization: Use managed node groups with mixed instances
eks_managed_node_groups = {
# Baseline: On-demand for critical workloads
baseline = {
instance_types = ["m6i.large", "m6a.large"]
capacity_type = "ON_DEMAND"
min_size = 2
max_size = 4
desired_size = 2
labels = {
workload-type = "critical"
}
}
# Burst: Spot instances for variable workloads
spot = {
instance_types = ["m6i.large", "m6a.large", "m5.large", "m5a.large"]
capacity_type = "SPOT"
min_size = 0
max_size = 20
desired_size = 2
labels = {
workload-type = "burstable"
}
taints = [{
key = "spot"
value = "true"
effect = "NO_SCHEDULE"
}]
}
}
}
Helm Chart for Cost Monitoring Stack
# helm/values-kubecost.yaml
kubecostProductConfigs:
clusterName: "production"
currencyCode: "USD"
defaultModelPricing:
enabled: true
alertConfigs:
enabled: true
alerts:
# Alert when daily spend exceeds budget
- type: budget
threshold: 100 # $100/day
window: daily
aggregation: cluster
# Alert on efficiency drops
- type: efficiency
threshold: 0.5 # Alert if efficiency drops below 50%
window: 48h
Cost Visibility with Kubecost
You cannot optimize what you cannot see. Kubecost provides namespace-level cost attribution and efficiency metrics:
Key Metrics to Track
- CPU efficiency: Actual usage vs requested. Target: 60-80%. Below 40% means you are over-provisioned.
- Memory efficiency: Same metric for memory. Target: 70-85%.
- Cost per namespace: Enables team-level accountability and chargeback.
- Idle cost: Resources allocated but unused. This is your primary optimization target.
- Network cost: Cross-zone and cross-region traffic charges — often a hidden cost driver.
Setting Up Namespace Budgets
# resource-quota.yaml - Enforce team-level resource budgets
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-alpha-quota
namespace: team-alpha
spec:
hard:
requests.cpu: "8" # 8 CPU cores max
requests.memory: 16Gi # 16GB memory max
limits.cpu: "16"
limits.memory: 32Gi
persistentvolumeclaims: "10"
services.loadbalancers: "2"
---
# limit-range.yaml - Default resource limits for pods
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: team-alpha
spec:
limits:
- default:
cpu: 500m
memory: 512Mi
defaultRequest:
cpu: 100m
memory: 128Mi
type: Container
FinOps Practices for Kubernetes
FinOps is the practice of bringing financial accountability to cloud spending. Here is how to implement it for Kubernetes:
The FinOps Lifecycle
- Inform: Give teams visibility into their spending with Kubecost dashboards and weekly cost reports.
- Optimize: Right-size workloads, implement autoscaling, and use spot instances. Target 60-70% resource efficiency.
- Operate: Establish policies (resource quotas, budget alerts) and continuously monitor for waste.
Weekly Cost Review Cadence
- Monday: Review previous week's cost trends and anomalies
- Wednesday: Check idle resources and orphaned assets
- Friday: Review autoscaling performance and adjust thresholds
Key Takeaway
FinOps is a culture change, not just a tooling change. Make cost a first-class engineering metric alongside latency, availability, and error rates. At CodingAlphas, teams that adopted FinOps practices reduced their cloud spend by 45% in the first quarter.
Storage and Network Optimization
Storage
- Use the cheapest storage class that meets your IOPS needs. Standard HDD (gp2/pd-standard) is sufficient for most workloads. Use SSD (gp3/pd-ssd) only for databases and I/O-intensive applications.
- Delete unused PVCs. Persistent volumes that are not attached to any pod still incur charges.
- Review snapshot retention. Old snapshots are often forgotten but still billed. Set up lifecycle policies.
Network
- Same-zone traffic: Keep tightly-coupled services in the same availability zone. Cross-zone traffic costs $0.01-0.02/GB.
- CDN for static assets: CloudFront or Cloud CDN drastically reduces egress costs for static content.
- Service mesh evaluation: Istio sidecars consume 128-256MB memory per pod. For a 100-pod cluster, that is 12-25GB of memory just for the mesh. Evaluate if the observability benefits justify the cost.
Quick Wins Checklist
These optimizations can be applied in a single day and typically yield 15-25% cost reduction:
- Delete unused deployments, pods, and PVCs
- Remove orphaned load balancers and static IPs
- Delete old container images from your registry (keep last 5 tags per service)
- Disable verbose logging in production namespaces
- Schedule non-critical workloads (batch jobs, CI runners) to run during off-peak hours
- Set resource requests on all pods (prevents bin-packing failures)
- Enable cluster autoscaler scale-down (default 10-minute delay is too aggressive — set to 5 minutes)
- Review and consolidate Kubernetes namespaces
Conclusion
Kubernetes cost optimization is an ongoing practice, not a one-time project. The combination of right-sizing, autoscaling, spot instances, and FinOps practices typically yields 40-60% cost reduction while actually improving reliability through better resource management.
Recommended Implementation Order
- Week 1: Install Kubecost, establish baseline spend, identify top waste areas.
- Week 2: Right-size the top 10 over-provisioned workloads using VPA recommendations.
- Week 3: Implement HPA for variable-load workloads and KEDA for queue-driven workers.
- Week 4: Add spot instance node pools for stateless workloads.
- Ongoing: Weekly cost reviews, monthly architecture reviews, quarterly reserved instance evaluations.
Need help optimizing your Kubernetes costs? At CodingAlphas, we have cut cloud spend for clients running everything from 3-node startups to 200-node enterprise clusters. Get a quote for a Kubernetes cost audit, or check out our Spring Boot performance guide to optimize your application alongside your infrastructure.
Written by
CodingAlphas Team
The CodingAlphas DevOps team has managed Kubernetes clusters across AWS, GCP, and Azure, helping clients cut cloud spend while improving reliability.
Related Articles
Spring Boot Performance Tuning: A Practical Guide
Learn proven techniques to optimize your Spring Boot applications for maximum throughput and minimal latency.
Building a SaaS MVP: The Complete Technical Guide
From architecture decisions to launch checklist, everything you need to build and ship your SaaS minimum viable product.