The $200K Surprise
A client’s Kubernetes bill jumped from $40K to $200K in three months. Nobody noticed because nobody owned the cost. Five teams shared three clusters with no visibility into who consumed what.
This is the FinOps problem for Kubernetes: shared infrastructure with zero accountability.
Step 1: Cost Visibility with OpenCost
OpenCost is the CNCF project for Kubernetes cost monitoring. It breaks down costs by namespace, label, deployment, or any Kubernetes dimension:
helm install opencost opencost/opencost \
--namespace opencost \
--set opencost.exporter.defaultClusterId=prod-eu \
--set opencost.ui.enabled=trueQuery costs via API:
# Cost by namespace for the last 7 days
curl "http://opencost:9003/allocation/compute?window=7d&aggregate=namespace"{
"team-payments": {
"cpuCost": 342.50,
"ramCost": 128.30,
"pvCost": 45.00,
"totalCost": 515.80
},
"team-search": {
"cpuCost": 1205.00,
"ramCost": 890.40,
"gpuCost": 2100.00,
"totalCost": 4195.40
}
}Now you can answer: “Team Search is spending $4,200/week, 60% on GPU.”
Step 2: Cost Allocation Strategy
Label Everything
# Require cost labels on all workloads
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-cost-labels
spec:
validationFailureAction: Enforce
rules:
- name: require-team-label
match:
resources:
kinds: ["Deployment", "StatefulSet", "Job"]
validate:
message: "All workloads must have team and cost-center labels"
pattern:
metadata:
labels:
team: "?*"
cost-center: "?*"Allocation Models
Proportional (recommended):
Team cost = (team resource usage / total cluster usage) × cluster bill
Fixed allocation:
Team cost = reserved resources × unit price
(Good for guaranteed capacity)
Hybrid:
Base cost (namespace reservation) + variable (actual usage above base)Step 3: Right-Sizing Automation
Most Kubernetes workloads are over-provisioned. VPA (Vertical Pod Autoscaler) fixes this:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: api-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: api-service
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: api
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 4GiIn my experience, VPA recommendations reduce resource requests by 40-60% on average. That translates directly to cost savings.
Step 4: Showback Dashboard
Build a Grafana dashboard teams can self-serve. I use the monitoring patterns from Kubernetes Recipes:
Row 1: Executive Summary
- Total monthly spend (actual vs budget)
- Cost trend (30-day rolling)
- Top 5 most expensive namespaces
- Waste score (requested vs used)
Row 2: Per-Team Breakdown
- Cost by team (stacked bar, weekly)
- Resource efficiency (used/requested ratio)
- Cost per request/transaction (unit economics)
Row 3: Optimization Opportunities
- Over-provisioned deployments (>50% idle)
- Idle PVCs (attached but unused)
- Unscaled HPA (always at minimum)Step 5: Budget Alerts
# Prometheus alert for cost spikes
- alert: TeamBudgetExceeded
expr: |
sum by (team) (
opencost_allocation_total_cost_daily
) * 30 > on(team) opencost_team_monthly_budget
for: 1h
labels:
severity: warning
annotations:
summary: "Team {{ $labels.team }} projected to exceed monthly budget"
description: "Projected: ${{ $value | humanize }}. Budget: check team allocation."Automating FinOps with Ansible
I deploy the full FinOps stack (OpenCost, VPA, Kyverno policies, Grafana dashboards) across clusters with Ansible:
- name: Deploy FinOps stack
hosts: k8s_clusters
roles:
- role: opencost
- role: vpa-controller
- role: cost-labels-policy
- role: finops-dashboardsPatterns at Ansible Pilot. Terraform-managed infrastructure cost tagging at Terraform Pilot.
The Cultural Shift
FinOps isn’t a tool — it’s a practice. The goal: teams understand and own their infrastructure costs, like they own their code quality. Visibility drives accountability. Accountability drives optimization.
That $200K surprise? After implementing showback, teams self-optimized to $65K within two months. Nobody likes seeing their name next to the biggest number on the cost dashboard.
