Why run your AI agent on Kubernetes?
Most people run OpenClaw on a single machine β a Raspberry Pi, a laptop, a small VPS. That works great for personal use. But what happens when you want:
- High availability β your agent stays online even if a node goes down
- GPU scheduling β route inference workloads to GPU nodes while the gateway runs on cheap CPUs
- Multi-agent orchestration β run multiple specialized agents with different models
- Persistent memory β agent memory survives pod restarts and node failures
- Integration with existing infrastructure β your agent lives alongside your monitoring, logging, and secrets management
That is when Kubernetes makes sense. And if you are already running a cluster, adding OpenClaw is straightforward.
Architecture overview
OpenClaw on Kubernetes follows a simple pattern:
ββββββββββββββββββββββββββββββββββββββββββββ
β Kubernetes Cluster β
β β
β βββββββββββββββ βββββββββββββββββββ β
β β OpenClaw β β OpenClaw β β
β β Gateway β β Agent Pods β β
β β (1 replica)β β (1-N replicas)β β
β ββββββββ¬βββββββ ββββββββββ¬βββββββββ β
β β β β
β ββββββββΌβββββββββββββββββββββΌβββββββββ β
β β PersistentVolumeClaim β β
β β (agent memory + workspace) β β
β ββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββ β
β β Ingress / LoadBalancer β β
β β (Discord/Telegram webhooks) β β
β ββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββThe gateway is the brain β it handles messaging channels, scheduling, and agent lifecycle. Agent pods do the actual work: processing messages, running tools, and maintaining memory.
Option 1: k3s single-node (simplest path)
If you are running a homelab or a small VPS, k3s gives you a full Kubernetes API with minimal overhead:
# Install k3s
curl -sfL https://get.k3s.io | sh -
# Create namespace
kubectl create namespace openclaw
# Create persistent storage for agent memory
kubectl apply -f - <<EOF
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: openclaw-data
namespace: openclaw
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
EOFThen deploy OpenClaw:
# openclaw-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: openclaw-gateway
namespace: openclaw
spec:
replicas: 1
selector:
matchLabels:
app: openclaw
template:
metadata:
labels:
app: openclaw
spec:
containers:
- name: openclaw
image: ghcr.io/openclaw/openclaw:latest
ports:
- containerPort: 3000
volumeMounts:
- name: data
mountPath: /home/node/.openclaw
env:
- name: OPENCLAW_MODEL
value: "anthropic/claude-sonnet-4"
envFrom:
- secretRef:
name: openclaw-secrets
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "2Gi"
cpu: "1000m"
volumes:
- name: data
persistentVolumeClaim:
claimName: openclaw-data# Create secrets
kubectl create secret generic openclaw-secrets \
--namespace openclaw \
--from-literal=ANTHROPIC_API_KEY=sk-ant-... \
--from-literal=DISCORD_TOKEN=...
# Deploy
kubectl apply -f openclaw-deployment.yamlOption 2: Multi-node cluster with GPU scheduling
For teams running inference locally, you want GPU workloads on GPU nodes and the gateway on regular nodes:
# Node labels
# GPU node: kubectl label node gpu-node-1 accelerator=nvidia
apiVersion: apps/v1
kind: Deployment
metadata:
name: openclaw-gateway
namespace: openclaw
spec:
replicas: 1
template:
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: accelerator
operator: DoesNotExist
containers:
- name: openclaw
image: ghcr.io/openclaw/openclaw:latest
resources:
requests:
memory: "512Mi"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: local-inference
namespace: openclaw
spec:
replicas: 1
template:
spec:
nodeSelector:
accelerator: nvidia
containers:
- name: ollama
image: ollama/ollama:latest
resources:
limits:
nvidia.com/gpu: 1
ports:
- containerPort: 11434This pairs well with the NVIDIA GPU Operator for automated driver and runtime management. If you are running multiple GPU nodes, GPU sharing with time-slicing lets multiple agents share a single GPU.
Persistent memory with PVCs
OpenClaw agents build memory over time β daily notes, long-term memories, workspace files. Losing this on a pod restart defeats the purpose. PersistentVolumeClaims ensure memory survives:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: openclaw-workspace
namespace: openclaw
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn # or your CSI driver
resources:
requests:
storage: 10GiFor multi-agent setups where multiple pods need read access to shared knowledge bases, consider ReadWriteMany volumes backed by NFS or a distributed filesystem.
Backup strategy
Agent memory is irreplaceable. Set up automated backups:
apiVersion: batch/v1
kind: CronJob
metadata:
name: openclaw-backup
namespace: openclaw
spec:
schedule: "0 */6 * * *" # Every 6 hours
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: alpine:latest
command:
- /bin/sh
- -c
- |
tar czf /backup/openclaw-$(date +%Y%m%d-%H%M).tar.gz \
/data/workspace /data/memory
volumeMounts:
- name: data
mountPath: /data
readOnly: true
- name: backup
mountPath: /backup
volumes:
- name: data
persistentVolumeClaim:
claimName: openclaw-data
- name: backup
persistentVolumeClaim:
claimName: openclaw-backups
restartPolicy: OnFailureExposing your agent to messaging channels
OpenClaw connects to Discord, Telegram, WhatsApp, and other channels. For webhook-based channels, you need an Ingress:
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: openclaw-ingress
namespace: openclaw
annotations:
cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
tls:
- hosts:
- agent.yourdomain.com
secretName: openclaw-tls
rules:
- host: agent.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: openclaw-gateway
port:
number: 3000For Discord and most messaging platforms, OpenClaw uses outbound WebSocket connections β no ingress needed. The gateway connects to Discordβs API directly from inside the cluster.
Monitoring with Prometheus
Add observability to track your agentβs behavior:
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: openclaw
namespace: openclaw
spec:
selector:
matchLabels:
app: openclaw
endpoints:
- port: metrics
interval: 30sKey metrics to watch:
- Message processing latency β how fast your agent responds
- Token usage β track API costs per agent
- Memory volume usage β prevent storage exhaustion
- Pod restarts β catch stability issues early
Multi-agent patterns
Kubernetes makes it easy to run multiple specialized agents:
# Research agent β uses a larger model, more memory
apiVersion: apps/v1
kind: Deployment
metadata:
name: openclaw-research
spec:
template:
spec:
containers:
- name: openclaw
env:
- name: OPENCLAW_MODEL
value: "anthropic/claude-opus-4"
- name: OPENCLAW_AGENT
value: "research"
---
# Operations agent β faster model, tool-heavy
apiVersion: apps/v1
kind: Deployment
metadata:
name: openclaw-ops
spec:
template:
spec:
containers:
- name: openclaw
env:
- name: OPENCLAW_MODEL
value: "anthropic/claude-sonnet-4"
- name: OPENCLAW_AGENT
value: "ops"Each agent gets its own workspace, memory, and channel connections. They can communicate through shared volumes or message passing.
Security hardening
Running AI agents in a cluster requires careful RBAC:
apiVersion: v1
kind: ServiceAccount
metadata:
name: openclaw
namespace: openclaw
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: openclaw-role
namespace: openclaw
rules:
- apiGroups: [""]
resources: ["pods", "services"]
verbs: ["get", "list"] # Read-only cluster access
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: openclaw-netpol
namespace: openclaw
spec:
podSelector:
matchLabels:
app: openclaw
policyTypes:
- Egress
egress:
- to: [] # Allow outbound (API calls, messaging)
ports:
- port: 443
- port: 80For production deployments, also consider:
- Pod Security Standards β restrict capabilities
- Secret rotation β use external-secrets-operator for API keys
- Image scanning β scan OpenClaw images before deployment
- Kyverno policies β enforce security baselines
Going deeper with Kubernetes
If you are new to Kubernetes and want to build a solid foundation before deploying agents, my book Kubernetes Recipes covers everything from cluster setup to production patterns β networking, storage, security, and observability. It is the practical companion for running real workloads (like OpenClaw) on Kubernetes.
For GPU-specific Kubernetes patterns, check out:
- NVIDIA GPU Operator setup guide
- GPU time-slicing for shared workloads
- Slurm to Kubernetes GPU migration
My take
Running OpenClaw on Kubernetes is overkill for a single personal agent on a Raspberry Pi. But for teams running multiple agents, integrating with GPU infrastructure, or wanting enterprise-grade reliability for their AI assistant, Kubernetes provides the right abstraction. The agent becomes just another workload in your cluster β backed by the same monitoring, scaling, and security you use for everything else.
The beauty of the cloud native approach: your AI agent benefits from everything the Kubernetes ecosystem has already solved.
Need help deploying AI workloads on Kubernetes? Get in touch for workshops on cloud native AI infrastructure and platform engineering.
Related Articles
- Configuring OpenClaw Gateway Bind and Control UI
- Fix: OpenClaw Model API Key Errors
- Setting Up OpenClaw Hybrid Memory Search
- Docker Volume Permissions for OpenClaw on Linux
- OpenClaw Origin Not Allowed Fix
- Architecture to Scale AI in the Enterprise
- GPU Sharing on Kubernetes Guide
- Packed Room at KubeCon Europe 2026: Multi-Tenant GPUs on Bare Metal