The Problem: Invisible Spending
Most organizations donβt know what each team or service costs in Kubernetes. The cloud bill shows βEKS: $50,000/monthβ β but which team? Which workload? Whatβs idle?
Kubecost vs OpenCost
| Feature | OpenCost | Kubecost |
|---|---|---|
| License | Apache 2.0 (CNCF) | Free tier + Enterprise |
| Cost allocation | β | β |
| Recommendations | β | β |
| Alerts | β | β |
| Multi-cluster | β | β (Enterprise) |
| Savings plans | β | β |
| UI | Basic | Full dashboard |
| API | β | β |
Installation (Kubecost)
helm install kubecost cost-analyzer/cost-analyzer \
--namespace kubecost --create-namespace \
--set kubecostToken="your-token" \
--set prometheus.server.enabled=trueCost Allocation Model
Total Cluster Cost = Node Cost + Storage Cost + Network Cost
Per-Pod Cost = (CPU Request / Node CPU) Γ Node Cost/hr
+ (Memory Request / Node Memory) Γ Node Cost/hr
+ PV Cost Γ (PV Size / Total PV)
Per-Namespace Cost = Ξ£ Pod Costs in Namespace
Per-Team Cost = Ξ£ Namespace Costs (label: team=X)Right-Sizing Recommendations
# Kubecost API β get savings opportunities
curl http://kubecost:9090/model/savings/requestSizing?window=7d
# Response:
{
"containerName": "api-server",
"currentCPURequest": "2000m",
"recommendedCPURequest": "350m",
"currentMemoryRequest": "4Gi",
"recommendedMemoryRequest": "1.2Gi",
"monthlySavings": "$180"
}Common Waste Patterns
| Pattern | Typical Waste | Fix |
|---|---|---|
| Over-provisioned requests | 40-60% | VPA or right-size manually |
| Idle namespaces | 10-20% | Auto-delete dev envs nightly |
| Orphan PVCs | 5-10% | PVC cleanup CronJob |
| No autoscaling | 20-30% | HPA/KEDA for variable workloads |
| Wrong instance type | 15-25% | Node auto-provisioner (Karpenter) |
| No spot/preemptible | 30-60% | Spot for stateless workloads |
Showback Dashboard (Grafana)
# Prometheus recording rules for cost
groups:
- name: cost-allocation
rules:
- record: namespace:cost:hourly
expr: |
sum by (namespace) (
container_cpu_allocation * on(node) group_left()
node_cost_per_cpu_hour
) +
sum by (namespace) (
container_memory_allocation_bytes / 1024 / 1024 / 1024
* on(node) group_left()
node_cost_per_gb_hour
)Quick Wins (First 30 Days)
- Delete idle workloads β dev/staging environments running 24/7
- Right-size top 10 over-provisioned pods β usually saves 30%+
- Enable HPA β for services with variable traffic
- Spot instances β for stateless, fault-tolerant workloads
- Reserved capacity β for stable baseline (commit 1-3 years)
Cost per Request (Business Metrics)
The ultimate goal β cost per business transaction:
Cost per API request = Namespace cost / Total requests
Cost per order = (Payment ns + Cart ns + Shipping ns) / Orders processedWhen you can show β$0.003 per order processedβ vs β$50,000 cloud bill,β finance understands.