Kubernetes Cost Optimization: Reduce Cloud Spend by 40%

Kubernetes makes it easy to deploy applications but equally easy to overspend. We have helped clients reduce their Kubernetes costs by 40-60% through systematic optimization. This guide shares the strategies that deliver the biggest impact, with actual Terraform and Helm configurations you can use today.

Right-Sizing Workloads

Over-provisioning is the single largest source of Kubernetes waste. Most teams request 3-5x more CPU and memory than their workloads actually use. Here is how to fix it:

Step 1: Measure Actual Usage

# Quick check: Compare requested vs actual usage
kubectl top pods -n production --sort-by=cpu

# More detailed: Export metrics for analysis
kubectl get pods -n production -o json | \
  jq '.items[] | {
    name: .metadata.name,
    cpu_request: .spec.containers[0].resources.requests.cpu,
    mem_request: .spec.containers[0].resources.requests.memory
  }'

# Install metrics-server if not already present
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Step 2: Install Vertical Pod Autoscaler (VPA)

VPA analyzes historical resource usage and recommends optimal CPU/memory settings:

# vpa-recommendation.yaml
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: api-server-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  updatePolicy:
    updateMode: "Off"  # Start with "Off" to get recommendations only
  resourcePolicy:
    containerPolicies:
      - containerName: api-server
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 2000m
          memory: 4Gi

Key Takeaway

Start VPA in recommendation mode ("Off"), not auto-update mode. Review its suggestions for 1-2 weeks before letting it auto-adjust. At CodingAlphas, we found VPA's recommendations reduced resource requests by 60% on average while keeping P99 latency stable.

Autoscaling Strategies

Scale with demand instead of provisioning for peak capacity:

# hpa-advanced.yaml - Scale on multiple metrics
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-server-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2
  maxReplicas: 20
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # Prevent thrashing
      policies:
        - type: Percent
          value: 25              # Scale down max 25% at a time
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Percent
          value: 100             # Scale up aggressively
          periodSeconds: 30
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80

KEDA for Event-Driven Scaling

For workloads driven by message queues, KEDA scales pods based on queue depth rather than CPU:

# keda-scaledobject.yaml
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: worker-scaler
spec:
  scaleTargetRef:
    name: background-worker
  minReplicaCount: 0    # Scale to zero when idle!
  maxReplicaCount: 50
  triggers:
    - type: rabbitmq
      metadata:
        queueName: tasks
        host: amqp://rabbitmq.default.svc.cluster.local
        queueLength: "5"  # 1 pod per 5 queue messages

Kubecost Dashboard and Autoscaling in Action

Using Kubecost for cost visibility alongside KEDA for event-driven scaling, we helped clients reduce idle resource costs by 45% in the first quarter while maintaining P99 latency SLAs through intelligent autoscaling policies.

Spot and Preemptible Instances

Spot instances (AWS), preemptible VMs (GCP), and spot VMs (Azure) offer 60-80% discounts for interruptible compute. The key is using them safely:

Rules for Spot Instance Usage

Stateless workloads only: Web servers, API gateways, workers — anything that can restart without data loss.
Multiple instance types: Diversify across instance types and availability zones to reduce interruption risk.
Graceful termination: Handle the 2-minute termination notice (AWS) to drain connections and finish in-flight work.
Mixed node pools: Keep a baseline of on-demand instances for critical workloads; burst onto spot for variable load.

Cloud Provider Comparison: EKS vs GKE vs AKS

Kubernetes itself is the same everywhere, but the managed service experience and pricing differ significantly:

Feature	AWS EKS	GCP GKE	Azure AKS
Control plane cost	$73/mo per cluster	Free (standard)	Free (standard)
Autopilot/serverless	Fargate (per pod)	GKE Autopilot	Virtual nodes
Spot discount	Up to 90%	60-91%	Up to 90%
Built-in cost tools	Cost Explorer	GKE cost allocation	Cost Management
Best for	Large enterprise, AWS-heavy	K8s-native teams, cost	Microsoft stack, hybrid

Key Takeaway

GKE Autopilot offers the best cost efficiency for most workloads — you pay only for pod resources with no node management overhead. For AWS shops, Karpenter (not Cluster Autoscaler) is the modern choice for node provisioning.

Terraform and Helm: Infrastructure as Code

Codify your cost optimization policies so they persist across cluster recreations:

# terraform/eks-optimized.tf
module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 20.0"

  cluster_name    = "production"
  cluster_version = "1.29"

  # Cost optimization: Use managed node groups with mixed instances
  eks_managed_node_groups = {
    # Baseline: On-demand for critical workloads
    baseline = {
      instance_types = ["m6i.large", "m6a.large"]
      capacity_type  = "ON_DEMAND"
      min_size       = 2
      max_size       = 4
      desired_size   = 2

      labels = {
        workload-type = "critical"
      }
    }

    # Burst: Spot instances for variable workloads
    spot = {
      instance_types = ["m6i.large", "m6a.large", "m5.large", "m5a.large"]
      capacity_type  = "SPOT"
      min_size       = 0
      max_size       = 20
      desired_size   = 2

      labels = {
        workload-type = "burstable"
      }

      taints = [{
        key    = "spot"
        value  = "true"
        effect = "NO_SCHEDULE"
      }]
    }
  }
}

Helm Chart for Cost Monitoring Stack

# helm/values-kubecost.yaml
kubecostProductConfigs:
  clusterName: "production"
  currencyCode: "USD"
  defaultModelPricing:
    enabled: true
  alertConfigs:
    enabled: true
    alerts:
      # Alert when daily spend exceeds budget
      - type: budget
        threshold: 100  # $100/day
        window: daily
        aggregation: cluster
      # Alert on efficiency drops
      - type: efficiency
        threshold: 0.5  # Alert if efficiency drops below 50%
        window: 48h

Cost Visibility with Kubecost

You cannot optimize what you cannot see. Kubecost provides namespace-level cost attribution and efficiency metrics:

Key Metrics to Track

CPU efficiency: Actual usage vs requested. Target: 60-80%. Below 40% means you are over-provisioned.
Memory efficiency: Same metric for memory. Target: 70-85%.
Cost per namespace: Enables team-level accountability and chargeback.
Idle cost: Resources allocated but unused. This is your primary optimization target.
Network cost: Cross-zone and cross-region traffic charges — often a hidden cost driver.

Setting Up Namespace Budgets

# resource-quota.yaml - Enforce team-level resource budgets
apiVersion: v1
kind: ResourceQuota
metadata:
  name: team-alpha-quota
  namespace: team-alpha
spec:
  hard:
    requests.cpu: "8"       # 8 CPU cores max
    requests.memory: 16Gi   # 16GB memory max
    limits.cpu: "16"
    limits.memory: 32Gi
    persistentvolumeclaims: "10"
    services.loadbalancers: "2"

---
# limit-range.yaml - Default resource limits for pods
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: team-alpha
spec:
  limits:
    - default:
        cpu: 500m
        memory: 512Mi
      defaultRequest:
        cpu: 100m
        memory: 128Mi
      type: Container

FinOps Practices for Kubernetes

FinOps is the practice of bringing financial accountability to cloud spending. Here is how to implement it for Kubernetes:

The FinOps Lifecycle

Inform: Give teams visibility into their spending with Kubecost dashboards and weekly cost reports.
Optimize: Right-size workloads, implement autoscaling, and use spot instances. Target 60-70% resource efficiency.
Operate: Establish policies (resource quotas, budget alerts) and continuously monitor for waste.

Weekly Cost Review Cadence

Monday: Review previous week's cost trends and anomalies
Wednesday: Check idle resources and orphaned assets
Friday: Review autoscaling performance and adjust thresholds

Key Takeaway

FinOps is a culture change, not just a tooling change. Make cost a first-class engineering metric alongside latency, availability, and error rates. At CodingAlphas, teams that adopted FinOps practices reduced their cloud spend by 45% in the first quarter.

Storage and Network Optimization

Storage

Use the cheapest storage class that meets your IOPS needs. Standard HDD (gp2/pd-standard) is sufficient for most workloads. Use SSD (gp3/pd-ssd) only for databases and I/O-intensive applications.
Delete unused PVCs. Persistent volumes that are not attached to any pod still incur charges.
Review snapshot retention. Old snapshots are often forgotten but still billed. Set up lifecycle policies.

Network

Same-zone traffic: Keep tightly-coupled services in the same availability zone. Cross-zone traffic costs $0.01-0.02/GB.
CDN for static assets: CloudFront or Cloud CDN drastically reduces egress costs for static content.
Service mesh evaluation: Istio sidecars consume 128-256MB memory per pod. For a 100-pod cluster, that is 12-25GB of memory just for the mesh. Evaluate if the observability benefits justify the cost.

Quick Wins Checklist

These optimizations can be applied in a single day and typically yield 15-25% cost reduction:

Delete unused deployments, pods, and PVCs
Remove orphaned load balancers and static IPs
Delete old container images from your registry (keep last 5 tags per service)
Disable verbose logging in production namespaces
Schedule non-critical workloads (batch jobs, CI runners) to run during off-peak hours
Set resource requests on all pods (prevents bin-packing failures)
Enable cluster autoscaler scale-down (default 10-minute delay is too aggressive — set to 5 minutes)
Review and consolidate Kubernetes namespaces

Conclusion

Kubernetes cost optimization is an ongoing practice, not a one-time project. The combination of right-sizing, autoscaling, spot instances, and FinOps practices typically yields 40-60% cost reduction while actually improving reliability through better resource management.

Recommended Implementation Order

Week 1: Install Kubecost, establish baseline spend, identify top waste areas.
Week 2: Right-size the top 10 over-provisioned workloads using VPA recommendations.
Week 3: Implement HPA for variable-load workloads and KEDA for queue-driven workers.
Week 4: Add spot instance node pools for stateless workloads.
Ongoing: Weekly cost reviews, monthly architecture reviews, quarterly reserved instance evaluations.

Need help optimizing your Kubernetes costs? At CodingAlphas, we have cut cloud spend for clients running everything from 3-node startups to 200-node enterprise clusters. Get a quote for a Kubernetes cost audit, or check out our Spring Boot performance guide to optimize your application alongside your infrastructure.

Kubernetes Cost Optimization: Reduce Cloud Spend by 40%

Right-Sizing Workloads

Step 1: Measure Actual Usage

Step 2: Install Vertical Pod Autoscaler (VPA)

Autoscaling Strategies

KEDA for Event-Driven Scaling

Spot and Preemptible Instances

Rules for Spot Instance Usage

Cloud Provider Comparison: EKS vs GKE vs AKS

Terraform and Helm: Infrastructure as Code

Helm Chart for Cost Monitoring Stack

Cost Visibility with Kubecost

Key Metrics to Track

Setting Up Namespace Budgets

FinOps Practices for Kubernetes

The FinOps Lifecycle

Weekly Cost Review Cadence

Storage and Network Optimization

Storage

Network

Quick Wins Checklist

Conclusion

Recommended Implementation Order

Related Articles

Spring Boot Performance Tuning: A Practical Guide

Building a SaaS MVP: The Complete Technical Guide

Related Case Studies

GreenRoute Delivery Optimizer

Want to work with us?