What AI and cloud consulting services does Luca Berton offer?

Luca Berton provides expert consulting in AI/ML platform strategy, multi-tenant GPU orchestration on OpenShift AI, MLOps enablement, cloud infrastructure design, Kubernetes workshops, and Ansible & Python training.

What is Ansible Pilot?

Ansible Pilot is the leading resource for Ansible automation learning, featuring a YouTube channel with 6.1K subscribers and 1M+ views, plus AnsiblePilot.com with 648K total users.

How can I book a consultation with Luca Berton?

Schedule a free consultation through Calendly at calendly.com/lucaberton or visit lucaberton.com/contact.

Platform Engineering

Kubernetes at the Edge: Running AI Workloads with KubeEdge and K3s

Luca Berton • Thu Feb 26 2026 • 2 min read •

#edge-ai#kubernetes#kubeedge#k3s#orchestration#platform-engineering

The Orchestration Problem

You’ve got 200 edge devices running AI models. How do you update models, monitor health, handle failures, and scale? If your answer is “SSH into each one,” you’re going to have a bad time.

Kubernetes solves this at the edge — but not vanilla Kubernetes. You need lightweight, edge-aware distributions.

K3s: Kubernetes for Constrained Devices

K3s strips Kubernetes down to a 70MB binary. It runs on ARM64, needs 512MB RAM, and supports air-gapped installations. For edge AI, this matters:

# Install K3s on an edge device
curl -sfL https://get.k3s.io | sh -

# Deploy an AI inference service
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: vision-inference
spec:
  replicas: 1
  selector:
    matchLabels:
      app: vision-inference
  template:
    metadata:
      labels:
        app: vision-inference
    spec:
      containers:
      - name: inference
        image: registry.internal/vision-model:v2.3
        resources:
          limits:
            nvidia.com/gpu: 1
        ports:
        - containerPort: 8080
EOF

KubeEdge: Cloud-Edge Coordination

KubeEdge extends Kubernetes to the edge with offline autonomy. The edge node keeps running even when disconnected from the cloud control plane:

# CloudCore (in your central cluster)
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cloudcore
  namespace: kubeedge
spec:
  template:
    spec:
      containers:
      - name: cloudcore
        image: kubeedge/cloudcore:v1.18
        ports:
        - containerPort: 10000  # WebSocket
        - containerPort: 10002  # QUIC

The killer feature: EdgeMesh handles service discovery across edge nodes without requiring each node to have a public IP.

Real-World Architecture

Here’s the pattern I deploy for manufacturing clients:

┌─────────────────────────────────┐
│  Central Cloud Cluster (AKS)    │
│  - Model registry               │
│  - Training pipelines            │
│  - Fleet management dashboard    │
│  - KubeEdge CloudCore           │
└──────────┬──────────────────────┘
           │ WebSocket/QUIC
    ┌──────┴──────┐
    │   Factory    │   × 12 locations
    │  ┌────────┐  │
    │  │ K3s    │  │
    │  │ Node 1 │──┼── Camera line A (YOLOv8)
    │  │        │  │
    │  │ Node 2 │──┼── Camera line B (YOLOv8)
    │  │        │  │
    │  │ Node 3 │──┼── Sensor fusion (custom model)
    │  └────────┘  │
    └──────────────┘

Model Updates Without Downtime

The biggest edge AI challenge isn’t inference — it’s updates. Here’s my rolling update strategy:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: defect-detection
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0
      maxSurge: 1
  template:
    spec:
      containers:
      - name: model
        image: registry.internal/defect-v3.1:int8
        readinessProbe:
          httpGet:
            path: /health
            port: 8080
          initialDelaySeconds: 30  # Model loading time

Key: maxUnavailable: 0 ensures no gap in inference during updates. The readiness probe waits for the model to load before routing traffic.

Monitoring Edge AI Fleet

Prometheus + Grafana works at the edge, but bandwidth matters. Use edge-side aggregation:

# Only send summaries to central Prometheus, not raw metrics
- job_name: 'edge-inference'
  scrape_interval: 60s
  metrics_path: /metrics/summary
  static_configs:
  - targets: ['inference:8080']

Track these metrics per node:

Inference latency (p50, p95, p99)
Model accuracy (via periodic validation)
GPU/NPU utilization
Queue depth (are we keeping up?)

Lessons Learned

Test offline resilience — unplug the network cable and verify the edge node keeps running
Pre-pull container images — don’t rely on pulling 2GB images over factory Wi-Fi
Hardware watchdogs — edge devices crash. Automatic reboot and recovery is non-negotiable
Canary deployments — update 5% of nodes first, verify accuracy, then roll out fleet-wide

Kubernetes at the edge isn’t a stretch — it’s the natural evolution. If you’re managing more than 10 edge AI devices, you need orchestration. K3s and KubeEdge give you that without the overhead.

📌 Need expert help with this topic?

🧠

AI Integration & GPU Platforms

Need help deploying AI/ML platforms? Get expert consulting on OpenShift AI, GPU orchestration, and MLOps.

☸️

Kubernetes & Containerization

Master Kubernetes and container orchestration with hands-on workshops and architecture consulting.

Book a free consultation →

Luca Berton

AI & Cloud Advisor with 18+ years experience. Author of 8 technical books, creator of Ansible Pilot, and instructor at CopyPasteLearn Academy. Speaker at KubeCon EU & Red Hat Summit 2026.

LinkedIn Bluesky YouTube Contact →

← Back to Blog

Platform Engineering

The Rise of AI Coding Agents: Impact on Platform Engineering Teams

How AI coding agents like GitHub Copilot Workspace and Cursor are reshaping platform engineering. What teams need to prepare for and how to leverage these tools.

Thu Feb 26 2026

Platform Engineering

Internal Developer Portals with Backstage and AI: The 2026 Playbook

Backstage is the de facto IDP. Adding AI makes it transformative — auto-generated docs, intelligent search, and self-service infrastructure. Here's the architecture.

Thu Feb 26 2026

Platform Engineering

Sustainable IT: Carbon-Aware Kubernetes Scheduling

Schedule Kubernetes workloads when and where the grid is greenest. How carbon-aware scheduling works, the tools available, and the business case for sustainable compute.

Thu Feb 26 2026

Kubernetes at the Edge: Running AI Workloads with KubeEdge and K3s

The Orchestration Problem

K3s: Kubernetes for Constrained Devices

KubeEdge: Cloud-Edge Coordination

Real-World Architecture

Model Updates Without Downtime

Monitoring Edge AI Fleet

Lessons Learned

📌 Need expert help with this topic?

AI Integration & GPU Platforms

Kubernetes & Containerization

Luca Berton

Related Articles

The Rise of AI Coding Agents: Impact on Platform Engineering Teams

Internal Developer Portals with Backstage and AI: The 2026 Playbook

Sustainable IT: Carbon-Aware Kubernetes Scheduling