Skip to main content
🎤 Speaking at KubeCon EU 2026 Lessons Learned Orchestrating Multi-Tenant GPUs on OpenShift AI View Session
🎤 Speaking at Red Hat Summit 2026 GPUs take flight: Safety-first multi-tenant Platform Engineering with NVIDIA and OpenShift AI Learn More
AI

GPU Sharing on Kubernetes: MIG, MPS, and Time-Slicing Compared

Luca Berton 1 min read
#gpu#kubernetes#nvidia#mig#ai-infrastructure

\n## 🎮 Stop Wasting Your GPUs

A single NVIDIA A100 costs $10K+ and most organizations use them at 30% utilization. GPU sharing lets multiple workloads share a single GPU, dramatically improving efficiency.

The Three Approaches

1. Time-Slicing

The simplest approach — multiple pods take turns using the full GPU:

# NVIDIA device plugin config
apiVersion: v1
kind: ConfigMap
metadata:
  name: nvidia-device-plugin
  namespace: kube-system
data:
  config.yaml: |
    version: v1
    sharing:
      timeSlicing:
        renameByDefault: false
        resources:
        - name: nvidia.com/gpu
          replicas: 4  # 4 pods share each GPU

Pros: Simple, no special hardware. Cons: No memory isolation — one pod can OOM-kill others. No performance guarantees.

2. Multi-Instance GPU (MIG)

Hardware-level partitioning on A100/A30/H100:

# Enable MIG mode
nvidia-smi -i 0 -mig 1

# Create GPU instances (A100 80GB example)
nvidia-smi mig -cgi 9,9,9,9,9,9,9 -i 0  # 7x 10GB instances

# Create compute instances
nvidia-smi mig -cci -gi 0,1,2,3,4,5,6 -i 0

Kubernetes sees each MIG instance as a separate resource:

resources:
  limits:
    nvidia.com/mig-1g.10gb: 1  # Request one MIG slice

Pros: Hardware isolation (memory + compute). Guaranteed performance. Cons: Only A100/A30/H100. Fixed partitioning — can’t change without stopping workloads.

3. Multi-Process Service (MPS)

CUDA-level sharing with better concurrency than time-slicing:

# Enable MPS in device plugin
sharing:
  mps:
    renameByDefault: false
    resources:
    - name: nvidia.com/gpu
      replicas: 10
      memoryLimit: 8Gi  # Per-client memory limit

Pros: Better utilization than time-slicing, some memory isolation. Cons: Less isolation than MIG. Shared failure domain — one bad CUDA kernel affects all.

When to Use What

ScenarioRecommendation
Development/testingTime-slicing (simple, flexible)
Production inference (A100/H100)MIG (hardware isolation)
Production inference (T4/V100)MPS (best available)
TrainingDon’t share — training needs full GPU
Multi-tenantMIG only (isolation requirement)

NVIDIA GPU Operator Setup

helm install gpu-operator nvidia/gpu-operator \
  --namespace gpu-operator --create-namespace \
  --set driver.enabled=true \
  --set mig.strategy=mixed \
  --set devicePlugin.config.name=nvidia-device-plugin

Monitoring GPU Utilization

# Prometheus rules for GPU waste detection
groups:
- name: gpu-efficiency
  rules:
  - alert: GPUUnderutilized
    expr: avg_over_time(DCGM_FI_DEV_GPU_UTIL[1h]) < 20
    for: 4h
    labels:
      severity: warning
    annotations:
      summary: "GPU {{ $labels.gpu }} utilized at {{ $value }}% for 4+ hours"

Key Takeaways

  1. Audit first — measure current GPU utilization before choosing a sharing strategy
  2. MIG for production — the only option with true hardware isolation
  3. Time-slicing for dev — simple and good enough for non-production
  4. Never share training GPUs — training needs dedicated, uninterrupted access
  5. Monitor constantly — GPU utilization should be a dashboard metric

Need to optimize your GPU infrastructure? I help organizations maximize GPU utilization on Kubernetes. Let’s connect.\n

Share:

Luca Berton

AI & Cloud Advisor with 18+ years experience. Author of 8 technical books, creator of Ansible Pilot, and instructor at CopyPasteLearn Academy. Speaker at KubeCon EU & Red Hat Summit 2026.

Luca Berton Ansible Pilot Ansible by Example Open Empower K8s Recipes Terraform Pilot CopyPasteLearn ProteinLens TechMeOut