Part 9 of a 10-part series on running AI workloads on Kubernetes in production.
Not a tools conference β an execution conference
My main takeaway from KubeCon + CloudNativeCon Europe 2026 in Amsterdam (23-26 March 2026) is that the conversation has moved beyond βshould we use Kubernetes?β to βhow do we run the next generation of AI and platform workloads with operational maturity?β
KubeCon 2026 felt less like a tools conference and more like an execution conference. The technology exists. The ecosystem is mature. What teams still lack is a clean operating model that connects platform engineering, AI workloads, and governance.
Three themes stood out.
Theme 1: Inference is the center of gravity
One of the clearest messages across sessions was that training is episodic, but inference is continuous. That matters because it shifts the platform conversation from βhow do I run occasional large jobs?β to βhow do I run always-on, latency-sensitive, cost-sensitive AI systems?β
This is where Kubernetes starts to become a real operating platform rather than just a scheduling layer. The autoscaling challenges I wrote about in this series β model load time, token-level latency, GPU memory constraints β were recurring themes across multiple talks.
Inference at scale requires:
- Custom autoscaling beyond simple HPA
- Model serving infrastructure that handles version management, canary deployments, and graceful rollover
- Cost-aware scheduling that balances latency SLOs with GPU costs
- Token economics that make business sense at millions of requests per day
Theme 2: AI-specific primitives are maturing
The CNCF landscape is no longer just container runtime, networking, and observability. The stack is stretching toward:
- AI workload acceleration β projects around GPU scheduling, inference optimization, and model serving
- Policy engines β admission control, resource governance, and compliance enforcement for AI workloads
- Delivery patterns β CI/CD for models that treats model artifacts as first-class deployment units
- Scheduling intelligence β beyond bin-packing to workload-aware, cost-aware, energy-aware placement
The ecosystem is building a proper control plane for AI workloads β not just repurposing old cloud-native patterns. This is significant because it means teams no longer have to build everything from scratch.
Theme 3: Production scale is the benchmark
The Uber Michelangelo examples were particularly telling:
- Thousands of models in production
- Tens of millions of predictions per second
- Large serving fleets including GPUs
- Operational maturity with automated deployment, monitoring, and rollback
This reframes the discussion. The question is no longer whether Kubernetes can support AI at scale. The question is whether teams can build the operating model, platform abstractions, and cost controls required to do it well.
What this means for engineering leaders
If you are leading an AI infrastructure initiative, KubeCon 2026 sends three clear signals:
1. Stop building custom infrastructure
The primitives exist. Kubernetes + GPU operator + inference servers + observability stack + policy engine gives you 80% of what you need. The remaining 20% is your operating model, team contracts, and business-specific integration.
2. Invest in the operating model
The gap is no longer technology availability β it is execution. Platform, SRE, and ML teams need clear contracts. Cost visibility needs to reach team level. Governance needs to be automated, not manual.
3. Plan for inference-dominant workloads
Training gets the headlines. Inference pays the bills. Design your platform for always-on serving workloads, not just episodic training jobs. That means investing in autoscaling, observability, and cost-per-request economics.
My KubeCon
Beyond the industry trends, KubeCon Amsterdam was a personal milestone. I presented to a packed room on multi-tenant GPUs on bare metal, co-MCed Cloud Native Rejekts with Julia Hahn, and gave away signed copies of Kubernetes Recipes at the vCluster booth.
The 13,350 attendees and 19.9 million cloud-native developers are not just impressive numbers β they represent a community that has matured from βletβs try containersβ to βletβs industrialize AI.β That is a meaningful shift.
Next: AI Platform First 90 Days: A Pragmatic Roadmap. Previous: Architecture Decisions Hardest to Reverse. Need help with your AI Kubernetes strategy? Book a free consultation.