A single A100 80GB GPU costs thousands of dollars per month. Most inference workloads use a fraction of that capacity. Multi-Instance GPU (MIG) lets you split one physical GPU into up to seven isolated instances, each with dedicated compute, memory, and memory bandwidth. The NVIDIA GPU Operator automates MIG configuration on Kubernetes.
What Is MIG
MIG is a hardware-level partitioning technology available on NVIDIA A100, A30, H100, and H200 GPUs. Unlike time-sharing or MPS (Multi-Process Service), MIG provides:
- Hardware isolation: each instance has dedicated streaming multiprocessors
- Memory isolation: each instance gets a dedicated memory slice with its own memory controllers
- Error isolation: a fault in one instance does not affect others
- QoS guarantees: each instance gets guaranteed bandwidth
MIG Profiles
The available profiles depend on your GPU model. For A100 80GB:
| Profile | GPU Memory | Compute SMs | Max Instances |
|---|---|---|---|
| 1g.10gb | 10 GB | 1/7 | 7 |
| 2g.20gb | 20 GB | 2/7 | 3 |
| 3g.40gb | 40 GB | 3/7 | 2 |
| 4g.40gb | 40 GB | 4/7 | 1 |
| 7g.80gb | 80 GB | 7/7 | 1 |
For H100 80GB:
| Profile | GPU Memory | Compute SMs | Max Instances |
|---|---|---|---|
| 1g.10gb | 10 GB | 1/7 | 7 |
| 1g.20gb | 20 GB | 1/7 | 4 |
| 2g.20gb | 20 GB | 2/7 | 3 |
| 3g.40gb | 40 GB | 3/7 | 2 |
| 7g.80gb | 80 GB | 7/7 | 1 |
Enabling MIG with the GPU Operator
Install with MIG Manager
helm install gpu-operator nvidia/gpu-operator \
--namespace gpu-operator \
--create-namespace \
--set migManager.enabled=true \
--set migManager.config.name=default-mig-parted-configDefine MIG Configuration
Create a ConfigMap with your MIG partitioning strategy:
apiVersion: v1
kind: ConfigMap
metadata:
name: default-mig-parted-config
namespace: gpu-operator
data:
config.yaml: |
version: v1
mig-configs:
all-1g.10gb:
- device-filter: ["0x2330", "0x2324"]
devices: all
mig-enabled: true
mig-devices:
"1g.10gb": 7
all-2g.20gb:
- device-filter: ["0x2330", "0x2324"]
devices: all
mig-enabled: true
mig-devices:
"2g.20gb": 3
all-3g.40gb:
- device-filter: ["0x2330", "0x2324"]
devices: all
mig-enabled: true
mig-devices:
"3g.40gb": 2
mixed-strategy:
- device-filter: ["0x2330", "0x2324"]
devices: all
mig-enabled: true
mig-devices:
"3g.40gb": 1
"2g.20gb": 1
"1g.10gb": 2
all-disabled:
- device-filter: ["0x2330", "0x2324"]
devices: all
mig-enabled: false
mig-devices: {}Apply MIG Configuration to Nodes
Label nodes with the desired MIG configuration:
# Split all GPUs on this node into 7x 1g.10gb instances
kubectl label node gpu-node-1 nvidia.com/mig.config=all-1g.10gb --overwrite
# Use mixed partitioning on another node
kubectl label node gpu-node-2 nvidia.com/mig.config=mixed-strategy --overwrite
# Disable MIG (use full GPUs)
kubectl label node gpu-node-3 nvidia.com/mig.config=all-disabled --overwriteThe MIG Manager will:
- Drain GPU workloads from the node
- Enable MIG mode on the GPUs
- Create the specified GPU instances
- Update the device plugin to expose new resources
- Uncordon the node
Scheduling Pods on MIG Instances
MIG instances appear as distinct resource types:
kubectl get nodes gpu-node-1 -o json | jq '.status.allocatable' | grep nvidiaOutput:
{
"nvidia.com/mig-1g.10gb": "7",
"nvidia.com/mig-2g.20gb": "0",
"nvidia.com/mig-3g.40gb": "0"
}Request a specific MIG instance in your pod:
apiVersion: v1
kind: Pod
metadata:
name: inference-small
spec:
containers:
- name: inference
image: nvcr.io/nvidia/tritonserver:24.07-py3
resources:
limits:
nvidia.com/mig-1g.10gb: 1Mixed MIG Strategies
Inference Cluster
Maximize pod density for small model inference:
# 7 inference pods per GPU, each with 10GB
mig-configs:
inference-dense:
- devices: all
mig-enabled: true
mig-devices:
"1g.10gb": 7Training + Inference
Split GPUs between training and inference:
# 1x 3g.40gb for training, 2x 2g.20gb for inference
mig-configs:
training-and-inference:
- devices: [0, 1] # First 2 GPUs for training
mig-enabled: true
mig-devices:
"3g.40gb": 2
- devices: [2, 3] # Last 2 GPUs for inference
mig-enabled: true
mig-devices:
"1g.10gb": 7Development Environment
Give each developer a GPU slice:
mig-configs:
dev-environment:
- devices: all
mig-enabled: true
mig-devices:
"1g.10gb": 7 # 7 devs per GPUMonitoring MIG Instances
DCGM Exporter reports per-instance metrics:
# Metrics include MIG instance labels
DCGM_FI_DEV_GPU_UTIL{gpu="0", GPU_I_ID="0", GPU_I_PROFILE="1g.10gb"} 45
DCGM_FI_DEV_FB_USED{gpu="0", GPU_I_ID="0", GPU_I_PROFILE="1g.10gb"} 2048Create Grafana dashboards that show utilization per MIG instance to identify under-utilized slices.
Cost Optimization
MIG directly impacts your GPU cost calculations:
- Without MIG: 1 GPU per inference pod = $2.50/hour per pod
- With MIG (7x 1g.10gb): 1 GPU serves 7 pods = $0.36/hour per pod
- Cost reduction: 86% for small inference workloads
This assumes the inference model fits in 10GB VRAM. Models like Mistral 7B (quantized to INT4) fit in under 5GB, making 1g.10gb instances more than sufficient.
Automating with Ansible
Manage MIG configuration across your fleet with Ansible:
---
- name: Configure MIG partitioning
hosts: localhost
vars:
mig_nodes:
- name: gpu-node-1
config: all-1g.10gb
- name: gpu-node-2
config: mixed-strategy
- name: gpu-node-3
config: all-disabled
tasks:
- name: Label nodes with MIG config
kubernetes.core.k8s:
state: patched
kind: Node
name: "{{ item.name }}"
definition:
metadata:
labels:
nvidia.com/mig.config: "{{ item.config }}"
loop: "{{ mig_nodes }}"Final Thoughts
MIG is the most effective way to improve GPU utilization on Kubernetes. If your inference pods are using 10-20GB of an 80GB GPU, you are wasting 60-70GB of expensive hardware. The GPU Operatorβs MIG Manager makes reconfiguration as simple as changing a node label. Start with your inference workloads where the ROI is immediate and obvious.