🎤 Speaking at KubeCon EU 2026 Lessons Learned Orchestrating Multi-Tenant GPUs on OpenShift AI View Session
Luca Berton
howto

Accelerating OpenShift networking with SR-IOV and RDMA (NVIDIA Network Operator)

Luca Berton •
#openshift#kubernetes#nvidia#network-operator#sriov#rdma#gpu#hpc#roce#infiniband#multus#cni

Accelerating OpenShift networking with SR-IOV and RDMA

If you run GPU training, HPC, or latency-sensitive workloads on OpenShift, the default pod networking model (overlay + kernel networking path) can become the bottleneck. Two technologies help you get closer to bare-metal behavior inside pods:

When you combine them and manage the host stack with the NVIDIA Network Operator, you get a repeatable, Kubernetes-native way to unlock a high-performance data plane for distributed AI and HPC.


What SR-IOV is doing for your pods

SR-IOV lets one physical NIC port (PF) present multiple VFs. Each VF behaves like its own NIC function and can be attached to a pod via Multus as a secondary interface.

Why that matters:

Think of SR-IOV as “give the pod a real NIC personality.”


What RDMA is doing for your workload

RDMA changes the rules of networking by enabling direct memory access semantics for data movement. In practice, it can deliver:

That CPU savings is a big deal for GPU clusters: if the node burns CPU on networking, you often pay twice—slower training/inference and fewer CPU cycles for input pipelines.

RDMA is commonly used with:


Why SR-IOV + RDMA is the sweet spot

Using RDMA on top of an SR-IOV VF is a popular design because you get both:

This is especially valuable for:

In short: SR-IOV gives you the lane; RDMA makes the lane extremely fast.


Where the NVIDIA Network Operator fits

The NVIDIA Network Operator is the “make it boring” part of this story.

Instead of manually installing and maintaining the networking stack across nodes (drivers, RDMA components, device plugins, and related configuration), the operator helps you manage it declaratively and consistently at cluster scale.

In real-world operations, that translates to:

You still need OpenShift networking pieces (like Multus/SR-IOV operator), but NVIDIA’s operator handles the NVIDIA/Mellanox-focused RDMA enablement and exposure.


Reference architecture (high level)

A common pattern looks like this:

  1. Cluster network (default CNI)
    Used for normal pod-to-service traffic, API calls, image pulls, etc.

  2. High-performance secondary network (Multus)
    A NetworkAttachmentDefinition (NAD) attaches an SR-IOV VF to pods.

  3. RDMA enabled on the VF
    Pods that request the VF can use RDMA-capable libraries (depending on your stack and workload).

This keeps Kubernetes networking sane while providing a dedicated fast path for the workloads that need it.


Prerequisites checklist

Before you start, verify these basics:


How it looks in OpenShift objects

You usually express the “plumbing” in two layers:

1) Platform layer (cluster admins)

Create and manage:

2) Workload layer (app teams)

Request:

Here’s an intentionally simplified sketch of what a workload request often resembles:

apiVersion: v1
kind: Pod
metadata:
  name: rdma-workload
  annotations:
    k8s.v1.cni.cncf.io/networks: sriov-net
spec:
  containers:
  - name: app
    image: your-image
    resources:
      limits:
        example.com/sriov_vf: "1"

Your actual resource name and NAD will differ, but the pattern is the same: “attach network + request VF.”


Performance best practices that actually move the needle

If you want SR-IOV/RDMA to pay off, focus on the things that usually dominate results:


Common gotchas and troubleshooting cues

These are the issues that tend to burn time:


When to use SR-IOV+RDMA vs alternatives

Use SR-IOV + RDMA when you need:

Consider alternatives when:


Wrap-up

SR-IOV and RDMA are about giving the right workloads a fast lane: fewer copies, lower jitter, lower CPU overhead, and better scaling under real distributed traffic. The NVIDIA Network Operator helps you operationalize this at scale—so the cluster stays manageable while your GPUs spend more time doing actual compute instead of waiting on the network.

← Back to Blog