Quick Decision
| Scenario | Use HPA | Use KEDA |
|---|---|---|
| CPU/memory scaling | β | Overkill |
| Scale from/to zero | β | β |
| Queue-based scaling | β | β |
| Cron-based scaling | β | β |
| Custom Prometheus metrics | Complex | β Easy |
| No extra dependencies | β | Requires KEDA operator |
Horizontal Pod Autoscaler (HPA)
HPA is Kubernetesβ built-in autoscaling mechanism. It watches resource metrics (CPU, memory) or custom metrics and adjusts replica count.
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 50
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15How HPA Works
- Metrics Server collects CPU/memory from kubelets (every 15s default)
- HPA controller checks metrics (every 15s default)
- Calculates desired replicas:
ceil(currentReplicas Γ (currentMetric / targetMetric)) - Scales deployment up or down within min/max bounds
HPA Limitations
- Cannot scale to zero β minimum is 1 replica
- Limited metric sources β CPU, memory, or custom metrics via adapters
- Custom metrics are complex β requires Prometheus Adapter or Datadog Cluster Agent
- No event-driven scaling β purely metric-based with polling interval
KEDA (Kubernetes Event-Driven Autoscaling)
KEDA extends Kubernetes with 60+ scalers for event-driven autoscaling. It creates and manages HPA objects behind the scenes.
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: order-processor
spec:
scaleTargetRef:
name: order-processor
pollingInterval: 10
cooldownPeriod: 300
minReplicaCount: 0 # Scale to zero!
maxReplicaCount: 100
triggers:
- type: rabbitmq
metadata:
queueName: orders
host: amqp://rabbitmq.default.svc:5672
queueLength: "5" # 1 pod per 5 messages
- type: cron
metadata:
timezone: Europe/Amsterdam
start: "0 8 * * 1-5"
end: "0 20 * * 1-5"
desiredReplicas: "3"KEDA Architecture
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β Scaler ββββββΆβ KEDA OperatorββββββΆβ HPA β
β (RabbitMQ) β β β β (auto-created)β
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β
βΌ
ββββββββββββββββ
β Deployment β
β 0βN replicasβ
ββββββββββββββββPopular KEDA Scalers
| Scaler | Use Case |
|---|---|
kafka | Scale on consumer lag |
rabbitmq | Scale on queue depth |
aws-sqs | Scale on SQS message count |
prometheus | Any Prometheus metric |
cron | Time-based pre-scaling |
postgresql | Scale on query result |
redis-streams | Scale on stream length |
azure-servicebus | Scale on subscription count |
Scale to Zero
KEDAβs killer feature β when no events exist, scale deployment to zero replicas:
spec:
minReplicaCount: 0
triggers:
- type: aws-sqs
metadata:
queueURL: https://sqs.eu-west-1.amazonaws.com/123/orders
queueLength: "1"
awsRegion: eu-west-1Cold start latency: first pod takes 5-30s to become ready (depends on image size and init).
Performance Comparison
Scaling Speed
| Metric | HPA | KEDA |
|---|---|---|
| Polling interval | 15s (default) | 10-30s (configurable) |
| Scale-up reaction | 15-45s | 10-40s |
| Scale-down cooldown | 5min (default) | Configurable |
| Zeroβ1 cold start | N/A | 5-30s |
Resource Overhead
| Component | CPU | Memory |
|---|---|---|
| HPA (built-in) | ~0 | ~0 |
| Metrics Server | 100m | 200Mi |
| KEDA Operator | 100m | 128Mi |
| KEDA Metrics Server | 100m | 128Mi |
Production Patterns
Pattern 1: HPA for Web + KEDA for Workers
# Web tier: CPU-based HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-api
spec:
minReplicas: 3
maxReplicas: 30
metrics:
- type: Resource
resource:
name: cpu
target:
averageUtilization: 60
---
# Worker tier: KEDA for queue processing
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: worker
spec:
minReplicaCount: 0
maxReplicaCount: 50
triggers:
- type: kafka
metadata:
topic: events
consumerGroup: workers
lagThreshold: "10"Pattern 2: KEDA with Prometheus for Custom Metrics
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: api-gateway
spec:
minReplicaCount: 2
maxReplicaCount: 20
triggers:
- type: prometheus
metadata:
serverAddress: http://prometheus:9090
query: |
sum(rate(http_requests_total{service="api"}[2m]))
threshold: "100" # Scale when >100 req/s per podInstallation
KEDA via Helm
helm repo add kedacore https://kedacore.github.io/charts
helm install keda kedacore/keda --namespace keda --create-namespaceVerify
kubectl get pods -n keda
# NAME READY
# keda-operator-5f8b4b6d4-xxxxx 1/1
# keda-operator-metrics-apiserver-xxx 1/1When to Choose
Stick with HPA when:
- CPU/memory metrics are sufficient
- You donβt need scale-to-zero
- You want zero additional dependencies
- Simple web applications with predictable load
Choose KEDA when:
- You need scale-to-zero for cost savings
- Event-driven workloads (queues, streams, schedules)
- You want simpler custom metric configuration
- Batch processing or async job workers
- Multi-trigger scaling logic