Skip to main content
πŸŽ“ Claude Code Masterclass Learn AI-assisted development on Udemy β€” plus the companion book on Leanpub & Amazon. Start Learning
Pod Disruption Budgets and Rolling Updates: Zero-Downtime Deployments
DevOps

Pod Disruption Budgets & Zero-Downtime Rolling Updates

Ensure zero downtime during upgrades β€” PDBs, rolling update strategies, readiness gates, preStop hooks, and graceful shutdown patterns.

LB
Luca Berton
Β· 1 min read

Zero-Downtime Deployments

Deployments should never cause user-facing errors. This requires coordinating:

  1. Rolling update strategy
  2. Pod Disruption Budgets (PDBs)
  3. Readiness probes
  4. Graceful shutdown (preStop hooks)
  5. Connection draining

Rolling Update Strategy

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api-server
spec:
  replicas: 5
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1        # 1 extra pod during update
      maxUnavailable: 0  # Never reduce below desired count
  template:
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: api
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 3
          lifecycle:
            preStop:
              exec:
                command: ["sh", "-c", "sleep 15"]  # Drain connections

Key settings:

  • maxUnavailable: 0 β€” never reduce capacity during update
  • maxSurge: 1 β€” only create 1 extra pod at a time (controls rollout speed)
  • preStop: sleep 15 β€” gives load balancer time to remove pod from rotation

Pod Disruption Budgets

PDBs protect against voluntary disruptions (node drains, cluster upgrades, autoscaler scale-down):

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-server-pdb
spec:
  minAvailable: 3       # Always keep at least 3 pods running
  # OR
  # maxUnavailable: 1   # At most 1 pod unavailable at a time
  selector:
    matchLabels:
      app: api-server
ReplicasPDB SettingEffect
5minAvailable: 32 pods can be disrupted simultaneously
5maxUnavailable: 1Only 1 pod disrupted at a time
3minAvailable: 21 pod at a time
1minAvailable: 1Block all voluntary disruptions ⚠️

Warning: minAvailable: 1 with 1 replica blocks node drains entirely. Don’t do this unless intentional.

The Complete Graceful Shutdown Flow

1. Pod marked for termination
2. Removed from Service endpoints (async!)
3. preStop hook executes (sleep 15)
4. SIGTERM sent to container
5. Application handles in-flight requests
6. Container exits (or killed after terminationGracePeriodSeconds)
// Go graceful shutdown
srv := &http.Server{Addr: ":8080"}

go func() {
    sig := make(chan os.Signal, 1)
    signal.Notify(sig, syscall.SIGTERM)
    <-sig

    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()

    srv.Shutdown(ctx)  // Finish in-flight requests
}()

Readiness Gates (Advanced)

spec:
  readinessGates:
    - conditionType: "target-health.elbv2.k8s.aws/my-target-group"

Pod isn’t β€œready” until the AWS ALB target group confirms it’s healthy. Prevents routing to pods that haven’t registered with the load balancer yet.

Node Drain Procedure

# Cordon (prevent new scheduling)
kubectl cordon node-1

# Drain (evict pods respecting PDBs)
kubectl drain node-1 \
  --ignore-daemonsets \
  --delete-emptydir-data \
  --grace-period=60 \
  --timeout=300s

# If PDB blocks drain:
# "Cannot evict pod as it would violate the pod's disruption budget"
# Wait for other replicas to become ready, then retry

Anti-Patterns

Anti-PatternRiskFix
No PDBAll pods evicted simultaneouslyAlways create PDB
No readiness probeTraffic to unready podsAdd HTTP probe
No preStop hookConnections dropped during terminationAdd sleep 15
terminationGracePeriod too shortForce-killed during drainSet 60s+
maxUnavailable: 50%Half capacity during updateUse maxUnavailable: 1

Free 30-min AI & Cloud consultation

Book Now