The moment it became real
There is a moment, right before you start speaking, when the room goes quiet. The slides are loaded. The microphone is on. Thousands of eyes are on you. And you realize: this is the biggest Cloud Native stage in the world, and they are here to listen.
![]()
Multi-tenant GPUs on Bare Metal OpenShift AI: A GitOps Blueprint from the Trenches. That is not just a talk title β it is six months of production experience compressed into 35 minutes.
Setting the scene
The Amsterdam backdrop on the KubeCon stage felt fitting. A city built on engineering, water management, and collaboration β just like the infrastructure we were about to discuss.
![]()
The presentation covered the full journey: from a bare-metal Dell AI Factory cluster with NVIDIA H200 GPUs to a production-ready, multi-tenant AI platform managed entirely through GitOps.
The problem: βIt runsβ is not enough
The first section hit the pain point that every GPU infrastructure team knows: getting GPUs to work is easy. Getting them to work safely for multiple teams is the real challenge.
Noisy neighbors hoarding GPU memory. Queue explosions where jobs starve. MIG partitions that fit one workload perfectly and cripple another. Driver drift across nodes. SR-IOV misconfigured, RDMA failures killing distributed training.
We needed a framework. We needed guardrails. We needed GitOps.
![]()
The framework: Safe, Fair, Efficient
Every decision in our multi-tenant GPU infrastructure filters through three lenses:
![]()
Safe β Blast radius equals zero. One team cannot break another. Namespace isolation, network policies, resource quotas β all enforced at the platform level.
Fair β Contention is deterministic. No more βrandom winsβ in GPU scheduling. Run:AI fair-share policies ensure every team gets their allocated GPU time, with preemption rules that are transparent and predictable.
Efficient β Outcomes per GPU-hour, not raw utilization. A GPU running at 100% utilization on a poorly optimized model is not efficient. We measure useful work delivered per GPU-hour.
This framework is not theoretical β it is how we run production AI workloads at Dell Technologies.
Showing the code
The most engaged moment of the talk was when I showed real configuration. Not abstract architecture diagrams β actual SR-IOV NicClusterPolicy YAML, actual OpenShift annotations, actual code that runs in our clusters.
![]()
The audience leaned in. This is what engineers come to KubeCon for β not slides about strategy, but real configuration they can take back to their teams on Monday morning. The openshift.io/sriovlegacy annotations, the network operator settings, the specific parameters that make RDMA work on bare metal.
The full picture
![]()
Standing on that stage, with the KubeCon and CloudNativeCon logos behind me, presenting work that my team and I built from scratch on bare metal β this was the culmination of everything.
From the Dell AI Factory hardware selection to the NVIDIA GPU Operator configuration, from the SR-IOV network isolation to the Run:AI scheduling policies, from the ArgoCD GitOps pipelines to the monitoring and alerting stack β every layer was designed, tested, debugged, and battle-hardened in production.
What the audience took away
Based on the questions after the talk and the conversations in the hallway, here is what resonated most:
MIG is not one-size-fits-all. Different workloads need different GPU partitioning strategies. We shared our decision matrix for when to use MIG vs. time-slicing vs. full GPU dedication.
SR-IOV configuration is critical and fragile. The specific annotations and NicClusterPolicy settings I showed were immediately useful to teams struggling with RDMA on bare metal.
GitOps for GPU infrastructure works. Managing NVIDIA operators, network operators, and scheduling policies through ArgoCD gives you reproducibility, audit trails, and rollback β essential for production GPU clusters.
The Safe/Fair/Efficient framework is portable. Whether you run OpenShift, vanilla Kubernetes, or a managed service, the three-lens approach to multi-tenancy decisions applies universally.
Bare metal is back. For serious AI workloads, the performance overhead of virtualization matters. Bare metal with proper isolation gives you cloud-like multi-tenancy without the performance tax.
Get the slides and go deeper
Download the full slide deck (PDF)
For deep dives into specific topics covered in the talk:
- NVIDIA GPU Operator setup on Kubernetes
- GPU sharing: MIG, MPS, and time-slicing
- SR-IOV NicClusterPolicy configuration
- KubeCon 2026 in numbers
- The community that makes it all possible
Want to discuss multi-tenant GPU infrastructure for your organization? Book a consultation.
More from KubeCon 2026: packed room recap, Cloud Native Rejekts MC experience, leaders and companies shaping the ecosystem.