The Orchestration Problem
Youβve got 200 edge devices running AI models. How do you update models, monitor health, handle failures, and scale? If your answer is βSSH into each one,β youβre going to have a bad time.
Kubernetes solves this at the edge β but not vanilla Kubernetes. You need lightweight, edge-aware distributions.
K3s: Kubernetes for Constrained Devices
K3s strips Kubernetes down to a 70MB binary. It runs on ARM64, needs 512MB RAM, and supports air-gapped installations. For edge AI, this matters:
# Install K3s on an edge device
curl -sfL https://get.k3s.io | sh -
# Deploy an AI inference service
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: vision-inference
spec:
replicas: 1
selector:
matchLabels:
app: vision-inference
template:
metadata:
labels:
app: vision-inference
spec:
containers:
- name: inference
image: registry.internal/vision-model:v2.3
resources:
limits:
nvidia.com/gpu: 1
ports:
- containerPort: 8080
EOFKubeEdge: Cloud-Edge Coordination
KubeEdge extends Kubernetes to the edge with offline autonomy. The edge node keeps running even when disconnected from the cloud control plane:
# CloudCore (in your central cluster)
apiVersion: apps/v1
kind: Deployment
metadata:
name: cloudcore
namespace: kubeedge
spec:
template:
spec:
containers:
- name: cloudcore
image: kubeedge/cloudcore:v1.18
ports:
- containerPort: 10000 # WebSocket
- containerPort: 10002 # QUICThe killer feature: EdgeMesh handles service discovery across edge nodes without requiring each node to have a public IP.
Real-World Architecture
Hereβs the pattern I deploy for manufacturing clients:
βββββββββββββββββββββββββββββββββββ
β Central Cloud Cluster (AKS) β
β - Model registry β
β - Training pipelines β
β - Fleet management dashboard β
β - KubeEdge CloudCore β
ββββββββββββ¬βββββββββββββββββββββββ
β WebSocket/QUIC
ββββββββ΄βββββββ
β Factory β Γ 12 locations
β ββββββββββ β
β β K3s β β
β β Node 1 ββββΌββ Camera line A (YOLOv8)
β β β β
β β Node 2 ββββΌββ Camera line B (YOLOv8)
β β β β
β β Node 3 ββββΌββ Sensor fusion (custom model)
β ββββββββββ β
ββββββββββββββββModel Updates Without Downtime
The biggest edge AI challenge isnβt inference β itβs updates. Hereβs my rolling update strategy:
apiVersion: apps/v1
kind: Deployment
metadata:
name: defect-detection
spec:
replicas: 2
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 0
maxSurge: 1
template:
spec:
containers:
- name: model
image: registry.internal/defect-v3.1:int8
readinessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30 # Model loading timeKey: maxUnavailable: 0 ensures no gap in inference during updates. The readiness probe waits for the model to load before routing traffic.
Monitoring Edge AI Fleet
Prometheus + Grafana works at the edge, but bandwidth matters. Use edge-side aggregation:
# Only send summaries to central Prometheus, not raw metrics
- job_name: 'edge-inference'
scrape_interval: 60s
metrics_path: /metrics/summary
static_configs:
- targets: ['inference:8080']Track these metrics per node:
- Inference latency (p50, p95, p99)
- Model accuracy (via periodic validation)
- GPU/NPU utilization
- Queue depth (are we keeping up?)
Lessons Learned
- Test offline resilience β unplug the network cable and verify the edge node keeps running
- Pre-pull container images β donβt rely on pulling 2GB images over factory Wi-Fi
- Hardware watchdogs β edge devices crash. Automatic reboot and recovery is non-negotiable
- Canary deployments β update 5% of nodes first, verify accuracy, then roll out fleet-wide
Kubernetes at the edge isnβt a stretch β itβs the natural evolution. If youβre managing more than 10 edge AI devices, you need orchestration. K3s and KubeEdge give you that without the overhead.
