What AI and cloud consulting services does Luca Berton offer?

Luca Berton provides expert consulting in AI/ML platform strategy, multi-tenant GPU orchestration on OpenShift AI, MLOps enablement, cloud infrastructure design, Kubernetes workshops, and Ansible & Python training.

What is Ansible Pilot?

Ansible Pilot is the leading resource for Ansible automation learning, featuring a YouTube channel with 6.1K subscribers and 1M+ views, plus AnsiblePilot.com with 648K total users.

How can I book a consultation with Luca Berton?

Schedule a free consultation through Calendly at calendly.com/lucaberton or visit lucaberton.com/contact.

GPUs Take Flight: Multi-Tenant Platform Engineering at

What a moment at Red Hat Summit 2026.

I had the opportunity to present my session “GPUs Take Flight: Safety-First Multi-Tenant Platform Engineering with NVIDIA and Red Hat OpenShift AI” at Discovery Theater 1 in Atlanta. The room was packed, the energy was high, and the topic clearly resonated.

Luca Berton presenting to a packed Discovery Theater at Red Hat Summit 2026 — Red Hat AI stage

Packed audience at Discovery Theater 1 — attendees sitting on the floor, headphones on, for the Multi-tenant GPUs on Bare Metal OpenShift AI session

The Core Question

How do we move AI from experimentation to production while keeping platforms secure, fair, observable, and scalable?

Every enterprise I work with faces the same challenge: GPU hardware is expensive, demand outstrips supply, and multiple teams need access simultaneously. Without platform engineering discipline, you get the Wild West — the loudest team wins, costs spiral, and nobody can explain why inference latency spiked at 3 AM.

Full audience view at Discovery Theater 1 — attendees with headsets engaged in the session

What I Covered

I shared practical patterns for building multi-tenant GPU platforms on bare metal OpenShift AI, drawn from real production deployments.

Open Kernel Modules and DMA-BUF

Slide: Open kernel modules + DMA-BUF — before (legacy) vs after (current) comparison

One of the foundational shifts: moving from proprietary .ko kernel modules and nvidia-peermem for GPUDirect to open kernel modules (in-tree) and DMA-BUF (upstream, kernel 6.x and above). The legacy approach meant tight coupling and upgrade fragility. The current approach decouples the GPU driver from the kernel, making upgrades dramatically safer.

Both changes reduce your upgrade risk surface — a critical concern when you are running multi-million dollar GPU clusters.

Fairness: Making Contention Deterministic

Slide: Fairness — make contention deterministic with PriorityClasses and KAI Scheduler

This was the slide that generated the most questions. Without explicit rules, the loudest team wins. My approach:

Per-tenant GPU caps — hard quotas, not just requests
PriorityClasses: P0 Training, P1 Serving, P2 Batch, P3 Interactive
Explicit preemption posture — who can evict whom, documented and deterministic
Scheduling constraints — labels, affinity, taints, tolerations
KAI Scheduler for GPU-aware scheduling and visibility

The key insight: contention is inevitable on shared GPU infrastructure. The question is whether it is deterministic (platform-engineered) or chaotic (first-come-first-served). Enterprise AI demands the former.

GitOps Tenant Bootstrap Bundles

Slide: Safety — tenant bootstrap bundle deployed via Argo CD with Kustomize

Every new tenant gets a complete bootstrap bundle deployed via Argo CD:

Namespace with resource boundaries
RBAC with least-privilege roles
NetworkPolicy for east-west isolation
Quotas for GPU, CPU, and memory limits
HAProxy VIP for inference endpoint routing

One Kustomize build per tenant, deployed via GitOps. Tenant definitions live in config/overlays/prod/tenants/. Auditable, reviewable, reproducible. No tickets. No manual provisioning.

The Full Architecture Stack

The session walked through the complete platform stack:

GPU Operator — automated driver lifecycle, NVIDIA GPU Operator manages node-level GPU software
Time-Slicing and MIG — sharing strategies for different workload profiles
Network Operator — SR-IOV and RDMA for high-bandwidth GPU-to-GPU communication
KAI Scheduler — topology-aware placement that understands NVLink, NVSwitch, and PCIe hierarchies
Observability — DCGM Exporter metrics, Prometheus alerts, Grafana dashboards for GPU utilization, memory, temperature, and power
Platform guardrails — admission webhooks that enforce GPU request patterns, prevent over-allocation, and validate tenant configurations

Enterprise AI Needs Platform Engineering

The core message I left the audience with: enterprise AI needs more than GPUs. It needs platform engineering discipline.

The GPU is the easy part. The hard part is building the platform around it — the scheduling, the isolation, the observability, the upgrade path, the cost attribution, the compliance audit trail. That is where platform engineering transforms AI from a science experiment into a production capability.

Thank You

Thank you to everyone who joined, asked questions, and continued the conversation afterward. Several attendees stayed for 20+ minutes of Q&A, diving into specific topics like GPU memory oversubscription, MIG vs time-slicing trade-offs, and how to handle spot-instance-style preemption for batch training jobs.

This is exactly why I love the Red Hat community: deep technical curiosity, practical enterprise focus, and a shared belief that open source is the path to production AI.

Summit Reflections

Red Hat Summit 2026 — what a week. Atlanta delivered. Across the keynote, sessions, labs, booths, and conversations, a few themes kept coming back:

AI is becoming a platform discipline. GPUs are expensive, contested, and business-critical. Without guardrails, quotas, scheduling, tenant isolation, and visibility, the loudest workload wins.

Open source is becoming the foundation for enterprise AI. From RHEL AI, InstructLab, Granite models, OpenShift AI, llm-d, Kubernetes, GitOps, and automation — the momentum is clear: customers want control, transparency, portability, and production-grade patterns.

The ecosystem matters more than ever. Red Hat, NVIDIA, Dell Technologies, IBM, Intel, AMD, cloud providers, partners, customers, and the open-source community are all solving different parts of the same puzzle: how to make AI practical at enterprise scale.

The conversation is shifting from “Can we use AI?” to “How do we run AI responsibly, efficiently, and at scale?” That is where platform engineering lives — and that is exactly what this talk was about.

Luca Berton with Chris Wright, CTO of Red Hat, at Red Hat Summit 2026

A highlight of the week: catching up with Chris Wright, CTO of Red Hat, whose vision for open-source AI infrastructure is shaping everything from InstructLab to OpenShift AI.

GPUs Take Flight: Multi-Tenant Platform Engineering at

The Core Question

What I Covered

Open Kernel Modules and DMA-BUF

Fairness: Making Contention Deterministic

GitOps Tenant Bootstrap Bundles

The Full Architecture Stack

Enterprise AI Needs Platform Engineering

Thank You

Summit Reflections

Related Articles

Gemini Spark vs OpenClaw: Who Holds Your Personal Agent?

Zaya1-8B: The Most Interesting Local LLM Since DeepSeek-R1

GBrain Tutorial

KADC 2026: openFuyao Forum

The Core Question

What I Covered

Open Kernel Modules and DMA-BUF

Fairness: Making Contention Deterministic

GitOps Tenant Bootstrap Bundles

The Full Architecture Stack

Enterprise AI Needs Platform Engineering

Thank You

Summit Reflections

Related Red Hat Summit 2026 Content

Related Articles

Gemini Spark vs OpenClaw: Who Holds Your Personal Agent?

Zaya1-8B: The Most Interesting Local LLM Since DeepSeek-R1

GBrain Tutorial

KADC 2026: openFuyao Forum