On Stage at KubeCon EU 2026: Multi-Tenant GPUs on Bare Metal

The moment it became real

There is a moment, right before you start speaking, when the room goes quiet. The slides are loaded. The microphone is on. Thousands of eyes are on you. And you realize: this is the biggest Cloud Native stage in the world, and they are here to listen.

Luca Berton on stage at KubeCon Europe 2026 next to the title slide: Multi-tenant GPUs on Bare Metal OpenShift AI

Multi-tenant GPUs on Bare Metal OpenShift AI: A GitOps Blueprint from the Trenches. That is not just a talk title — it is six months of production experience compressed into 35 minutes.

Setting the scene

The Amsterdam backdrop on the KubeCon stage felt fitting. A city built on engineering, water management, and collaboration — just like the infrastructure we were about to discuss.

Wide shot of the KubeCon stage with the presentation title and Amsterdam-themed graphics

The presentation covered the full journey: from a bare-metal Dell AI Factory cluster with NVIDIA H200 GPUs to a production-ready, multi-tenant AI platform managed entirely through GitOps.

The problem: “It runs” is not enough

The first section hit the pain point that every GPU infrastructure team knows: getting GPUs to work is easy. Getting them to work safely for multiple teams is the real challenge.

Noisy neighbors hoarding GPU memory. Queue explosions where jobs starve. MIG partitions that fit one workload perfectly and cripple another. Driver drift across nodes. SR-IOV misconfigured, RDMA failures killing distributed training.

We needed a framework. We needed guardrails. We needed GitOps.

Luca Berton speaking at the podium, explaining the need for a GitOps-driven framework

The framework: Safe, Fair, Efficient

Every decision in our multi-tenant GPU infrastructure filters through three lenses:

The Framework slide showing Safe, Fair, and Efficient pillars with Dell Technologies branding

Safe — Blast radius equals zero. One team cannot break another. Namespace isolation, network policies, resource quotas — all enforced at the platform level.
Fair — Contention is deterministic. No more “random wins” in GPU scheduling. Run:AI fair-share policies ensure every team gets their allocated GPU time, with preemption rules that are transparent and predictable.
Efficient — Outcomes per GPU-hour, not raw utilization. A GPU running at 100% utilization on a poorly optimized model is not efficient. We measure useful work delivered per GPU-hour.

This framework is not theoretical — it is how we run production AI workloads at Dell Technologies.

Showing the code

The most engaged moment of the talk was when I showed real configuration. Not abstract architecture diagrams — actual SR-IOV NicClusterPolicy YAML, actual OpenShift annotations, actual code that runs in our clusters.

Luca Berton at the podium showing SR-IOV legacy configuration code on screen

The audience leaned in. This is what engineers come to KubeCon for — not slides about strategy, but real configuration they can take back to their teams on Monday morning. The openshift.io/sriovlegacy annotations, the network operator settings, the specific parameters that make RDMA work on bare metal.

The full picture

Wide shot of Luca Berton presenting on the main KubeCon CloudNativeCon Europe 2026 stage

Standing on that stage, with the KubeCon and CloudNativeCon logos behind me, presenting work that my team and I built from scratch on bare metal — this was the culmination of everything.

From the Dell AI Factory hardware selection to the NVIDIA GPU Operator configuration, from the SR-IOV network isolation to the Run:AI scheduling policies, from the ArgoCD GitOps pipelines to the monitoring and alerting stack — every layer was designed, tested, debugged, and battle-hardened in production.

What the audience took away

Based on the questions after the talk and the conversations in the hallway, here is what resonated most:

MIG is not one-size-fits-all. Different workloads need different GPU partitioning strategies. We shared our decision matrix for when to use MIG vs. time-slicing vs. full GPU dedication.
SR-IOV configuration is critical and fragile. The specific annotations and NicClusterPolicy settings I showed were immediately useful to teams struggling with RDMA on bare metal.
GitOps for GPU infrastructure works. Managing NVIDIA operators, network operators, and scheduling policies through ArgoCD gives you reproducibility, audit trails, and rollback — essential for production GPU clusters.
The Safe/Fair/Efficient framework is portable. Whether you run OpenShift, vanilla Kubernetes, or a managed service, the three-lens approach to multi-tenancy decisions applies universally.
Bare metal is back. For serious AI workloads, the performance overhead of virtualization matters. Bare metal with proper isolation gives you cloud-like multi-tenancy without the performance tax.

Get the slides and go deeper

Download the full slide deck (PDF)

For deep dives into specific topics covered in the talk:

Want to discuss multi-tenant GPU infrastructure for your organization? Book a consultation.

More from KubeCon 2026: packed room recap, Cloud Native Rejekts MC experience, leaders and companies shaping the ecosystem.

On Stage at KubeCon EU 2026: Multi-Tenant GPUs on Bare Metal

The moment it became real

Setting the scene

The problem: “It runs” is not enough

The framework: Safe, Fair, Efficient

Showing the code

The full picture

What the audience took away

Get the slides and go deeper

Related Articles

Differential Privacy: How Math Protects Your Privacy

GLM-5.2 744B: Sparse Attention Meets Efficient MoE

Reliable AI Agents in Java with LangChain4J — Workshop

AI Gateway on Kubernetes: Route and Load-Balance LLM Traffic