What a moment at Red Hat Summit 2026.
I had the opportunity to present my session βGPUs Take Flight: Safety-First Multi-Tenant Platform Engineering with NVIDIA and Red Hat OpenShift AIβ at Discovery Theater 1 in Atlanta. The room was packed, the energy was high, and the topic clearly resonated.


The Core Question
How do we move AI from experimentation to production while keeping platforms secure, fair, observable, and scalable?
Every enterprise I work with faces the same challenge: GPU hardware is expensive, demand outstrips supply, and multiple teams need access simultaneously. Without platform engineering discipline, you get the Wild West β the loudest team wins, costs spiral, and nobody can explain why inference latency spiked at 3 AM.

What I Covered
I shared practical patterns for building multi-tenant GPU platforms on bare metal OpenShift AI, drawn from real production deployments.
Open Kernel Modules and DMA-BUF

One of the foundational shifts: moving from proprietary .ko kernel modules and nvidia-peermem for GPUDirect to open kernel modules (in-tree) and DMA-BUF (upstream, kernel 6.x and above). The legacy approach meant tight coupling and upgrade fragility. The current approach decouples the GPU driver from the kernel, making upgrades dramatically safer.
Both changes reduce your upgrade risk surface β a critical concern when you are running multi-million dollar GPU clusters.
Fairness: Making Contention Deterministic

This was the slide that generated the most questions. Without explicit rules, the loudest team wins. My approach:
- Per-tenant GPU caps β hard quotas, not just requests
- PriorityClasses: P0 Training, P1 Serving, P2 Batch, P3 Interactive
- Explicit preemption posture β who can evict whom, documented and deterministic
- Scheduling constraints β labels, affinity, taints, tolerations
- KAI Scheduler for GPU-aware scheduling and visibility
The key insight: contention is inevitable on shared GPU infrastructure. The question is whether it is deterministic (platform-engineered) or chaotic (first-come-first-served). Enterprise AI demands the former.
GitOps Tenant Bootstrap Bundles

Every new tenant gets a complete bootstrap bundle deployed via Argo CD:
- Namespace with resource boundaries
- RBAC with least-privilege roles
- NetworkPolicy for east-west isolation
- Quotas for GPU, CPU, and memory limits
- HAProxy VIP for inference endpoint routing
One Kustomize build per tenant, deployed via GitOps. Tenant definitions live in config/overlays/prod/tenants/. Auditable, reviewable, reproducible. No tickets. No manual provisioning.
The Full Architecture Stack
The session walked through the complete platform stack:
- GPU Operator β automated driver lifecycle, NVIDIA GPU Operator manages node-level GPU software
- Time-Slicing and MIG β sharing strategies for different workload profiles
- Network Operator β SR-IOV and RDMA for high-bandwidth GPU-to-GPU communication
- KAI Scheduler β topology-aware placement that understands NVLink, NVSwitch, and PCIe hierarchies
- Observability β DCGM Exporter metrics, Prometheus alerts, Grafana dashboards for GPU utilization, memory, temperature, and power
- Platform guardrails β admission webhooks that enforce GPU request patterns, prevent over-allocation, and validate tenant configurations
Enterprise AI Needs Platform Engineering
The core message I left the audience with: enterprise AI needs more than GPUs. It needs platform engineering discipline.
The GPU is the easy part. The hard part is building the platform around it β the scheduling, the isolation, the observability, the upgrade path, the cost attribution, the compliance audit trail. That is where platform engineering transforms AI from a science experiment into a production capability.
Thank You
Thank you to everyone who joined, asked questions, and continued the conversation afterward. Several attendees stayed for 20+ minutes of Q&A, diving into specific topics like GPU memory oversubscription, MIG vs time-slicing trade-offs, and how to handle spot-instance-style preemption for batch training jobs.
This is exactly why I love the Red Hat community: deep technical curiosity, practical enterprise focus, and a shared belief that open source is the path to production AI.
Summit Reflections
Red Hat Summit 2026 β what a week. Atlanta delivered. Across the keynote, sessions, labs, booths, and conversations, a few themes kept coming back:
AI is becoming a platform discipline. GPUs are expensive, contested, and business-critical. Without guardrails, quotas, scheduling, tenant isolation, and visibility, the loudest workload wins.
Open source is becoming the foundation for enterprise AI. From RHEL AI, InstructLab, Granite models, OpenShift AI, llm-d, Kubernetes, GitOps, and automation β the momentum is clear: customers want control, transparency, portability, and production-grade patterns.
The ecosystem matters more than ever. Red Hat, NVIDIA, Dell Technologies, IBM, Intel, AMD, cloud providers, partners, customers, and the open-source community are all solving different parts of the same puzzle: how to make AI practical at enterprise scale.
The conversation is shifting from βCan we use AI?β to βHow do we run AI responsibly, efficiently, and at scale?β That is where platform engineering lives β and that is exactly what this talk was about.

A highlight of the week: catching up with Chris Wright, CTO of Red Hat, whose vision for open-source AI infrastructure is shaping everything from InstructLab to OpenShift AI.