Skip to main content
🎀 Speaking at KubeCon EU 2026 Lessons Learned Orchestrating Multi-Tenant GPUs on OpenShift AI View Session
🎀 Speaking at Red Hat Summit 2026 GPUs take flight: Safety-first multi-tenant Platform Engineering with NVIDIA and OpenShift AI Learn More
GPU sharing strategies on Kubernetes
AI

GPU Sharing on Kubernetes Guide

Comprehensive comparison of NVIDIA GPU sharing strategies on Kubernetes: Multi-Instance GPU, Multi-Process Service, and time-slicing. When to use each approach.

LB
Luca Berton
Β· 1 min read

Stop Wasting Your GPUs

A single NVIDIA A100 costs $10K+ and most organizations use them at 30% utilization. GPU sharing lets multiple workloads share a single GPU, dramatically improving efficiency.

The Three Approaches

1. Time-Slicing

The simplest approach β€” multiple pods take turns using the full GPU:

# NVIDIA device plugin config
apiVersion: v1
kind: ConfigMap
metadata:
  name: nvidia-device-plugin
  namespace: kube-system
data:
  config.yaml: |
    version: v1
    sharing:
      timeSlicing:
        renameByDefault: false
        resources:
        - name: nvidia.com/gpu
          replicas: 4  # 4 pods share each GPU

Pros: Simple, no special hardware. Cons: No memory isolation β€” one pod can OOM-kill others. No performance guarantees.

2. Multi-Instance GPU (MIG)

Hardware-level partitioning on A100/A30/H100:

# Enable MIG mode
nvidia-smi -i 0 -mig 1

# Create GPU instances (A100 80GB example)
nvidia-smi mig -cgi 9,9,9,9,9,9,9 -i 0  # 7x 10GB instances

# Create compute instances
nvidia-smi mig -cci -gi 0,1,2,3,4,5,6 -i 0

Kubernetes sees each MIG instance as a separate resource:

resources:
  limits:
    nvidia.com/mig-1g.10gb: 1  # Request one MIG slice

Pros: Hardware isolation (memory + compute). Guaranteed performance. Cons: Only A100/A30/H100. Fixed partitioning β€” can’t change without stopping workloads.

3. Multi-Process Service (MPS)

CUDA-level sharing with better concurrency than time-slicing:

# Enable MPS in device plugin
sharing:
  mps:
    renameByDefault: false
    resources:
    - name: nvidia.com/gpu
      replicas: 10
      memoryLimit: 8Gi  # Per-client memory limit

Pros: Better utilization than time-slicing, some memory isolation. Cons: Less isolation than MIG. Shared failure domain β€” one bad CUDA kernel affects all.

When to Use What

ScenarioRecommendation
Development/testingTime-slicing (simple, flexible)
Production inference (A100/H100)MIG (hardware isolation)
Production inference (T4/V100)MPS (best available)
TrainingDon’t share β€” training needs full GPU
Multi-tenantMIG only (isolation requirement)

NVIDIA GPU Operator Setup

helm install gpu-operator nvidia/gpu-operator \
  --namespace gpu-operator --create-namespace \
  --set driver.enabled=true \
  --set mig.strategy=mixed \
  --set devicePlugin.config.name=nvidia-device-plugin

Monitoring GPU Utilization

# Prometheus rules for GPU waste detection
groups:
- name: gpu-efficiency
  rules:
  - alert: GPUUnderutilized
    expr: avg_over_time(DCGM_FI_DEV_GPU_UTIL[1h]) < 20
    for: 4h
    labels:
      severity: warning
    annotations:
      summary: "GPU {{ $labels.gpu }} utilized at {{ $value }}% for 4+ hours"

Key Takeaways

  1. Audit first β€” measure current GPU utilization before choosing a sharing strategy
  2. MIG for production β€” the only option with true hardware isolation
  3. Time-slicing for dev β€” simple and good enough for non-production
  4. Never share training GPUs β€” training needs dedicated, uninterrupted access
  5. Monitor constantly β€” GPU utilization should be a dashboard metric

Need to optimize your GPU infrastructure? I help organizations maximize GPU utilization on Kubernetes. Let’s connect.\n

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens TechMeOut