The NVIDIA Device Plugin is the bridge between your GPUs and Kubernetes scheduling. It exposes GPUs as nvidia.com/gpu resources that pods can request. Through the GPU Operator, you can configure advanced features like time-slicing, custom resource names, and GPU health monitoring.
Default Behavior
Out of the box, the device plugin:
- Discovers all NVIDIA GPUs on each node
- Exposes them as
nvidia.com/gpuresources - Allocates whole GPUs to pods (1 GPU = 1 resource unit)
- Runs health checks on allocated GPUs
# Check GPU resources on a node
kubectl describe node gpu-node-1 | grep -A 5 "Allocatable"
# nvidia.com/gpu: 8Device Plugin ConfigMap
Create a ConfigMap to customize device plugin behavior:
apiVersion: v1
kind: ConfigMap
metadata:
name: device-plugin-config
namespace: gpu-operator
data:
default: |
version: v1
sharing:
timeSlicing:
renameByDefault: false
failRequestsGreaterThanOne: false
resources:
- name: nvidia.com/gpu
replicas: 1
time-slicing-4: |
version: v1
sharing:
timeSlicing:
renameByDefault: true
failRequestsGreaterThanOne: false
resources:
- name: nvidia.com/gpu
replicas: 4
time-slicing-10: |
version: v1
sharing:
timeSlicing:
renameByDefault: true
failRequestsGreaterThanOne: false
resources:
- name: nvidia.com/gpu
replicas: 10GPU Time-Slicing
Time-slicing lets multiple pods share a single GPU through time-division multiplexing. Unlike MIG, time-slicing does not provide memory isolation β all pods share the full GPU memory space.
Enable Time-Slicing
# Label a node to use 4-way time-slicing
kubectl label node dev-gpu-node nvidia.com/device-plugin.config=time-slicing-4
# The node now reports 4x the actual GPU count
# 2 physical GPUs -> 8 allocatable nvidia.com/gpu.sharedWhen to Use Time-Slicing vs MIG
| Feature | Time-Slicing | MIG |
|---|---|---|
| Memory isolation | No (shared) | Yes (dedicated) |
| Compute isolation | No (time-shared) | Yes (dedicated SMs) |
| Error isolation | No | Yes |
| GPU support | All NVIDIA GPUs | A100, A30, H100 only |
| Overhead | Minimal | Minimal |
| Use case | Dev/test, light inference | Production inference, multi-tenant |
Use time-slicing for: development environments, Jupyter notebooks, light inference workloads where memory isolation is not critical.
Use MIG for: production inference, multi-tenant clusters, workloads requiring guaranteed performance.
Time-Slicing Pod Spec
apiVersion: v1
kind: Pod
metadata:
name: dev-notebook
spec:
containers:
- name: jupyter
image: nvcr.io/nvidia/pytorch:24.07-py3
command: ["jupyter", "lab", "--ip=0.0.0.0"]
resources:
limits:
nvidia.com/gpu.shared: 1 # Gets 1/4 of a GPU (with replicas=4)GPU Health Monitoring
Configure health checks that remove unhealthy GPUs from the schedulable pool:
apiVersion: v1
kind: ConfigMap
metadata:
name: device-plugin-config
namespace: gpu-operator
data:
default: |
version: v1
flags:
migStrategy: none
failOnInitError: true
nvidiaDriverRoot: /
gdsEnabled: false
mpsEnabled: false
health:
plugin:
- name: nvidia.com/gpu
failOnInitError: trueThe device plugin runs Xid error monitoring. When a GPU reports specific Xid errors (hardware faults), the plugin marks the GPU as unhealthy and Kubernetes stops scheduling pods to it.
Custom Resource Naming
Expose different GPU models as different resource types:
apiVersion: v1
kind: ConfigMap
metadata:
name: device-plugin-config
namespace: gpu-operator
data:
multi-gpu-types: |
version: v1
flags:
migStrategy: none
sharing:
timeSlicing:
resources:
- name: nvidia.com/gpu.a100
replicas: 1
devices:
- "GPU-xxxxx-a100-uuid"
- name: nvidia.com/gpu.t4
replicas: 4
devices:
- "GPU-xxxxx-t4-uuid"This lets you request specific GPU types:
resources:
limits:
nvidia.com/gpu.a100: 1 # Specifically request an A100GPU Feature Discovery Integration
The GPU Feature Discovery component (deployed by the GPU Operator) labels nodes with GPU attributes:
kubectl get node gpu-node-1 --show-labels | tr ',' '
' | grep nvidia
# nvidia.com/gpu.product=NVIDIA-A100-SXM4-80GB
# nvidia.com/gpu.count=8
# nvidia.com/gpu.memory=81920
# nvidia.com/gpu.family=ampere
# nvidia.com/mig.capable=true
# nvidia.com/cuda.driver.major=550
# nvidia.com/gpu.compute.major=8Use these labels for targeted scheduling:
spec:
nodeSelector:
nvidia.com/gpu.product: NVIDIA-A100-SXM4-80GB
containers:
- name: training
resources:
limits:
nvidia.com/gpu: 8Configuring via GPU Operator ClusterPolicy
apiVersion: nvidia.com/v1
kind: ClusterPolicy
metadata:
name: cluster-policy
spec:
devicePlugin:
enabled: true
version: v0.16.2
config:
name: device-plugin-config # Reference the ConfigMap
default: default # Default config key
env:
- name: PASS_DEVICE_SPECS
value: "true"
- name: DEVICE_LIST_STRATEGY
value: "envvar"
- name: DEVICE_ID_STRATEGY
value: "uuid"Node-Level Configuration
Different nodes can use different device plugin configurations:
# Production nodes: no sharing
kubectl label node prod-gpu-1 nvidia.com/device-plugin.config=default
# Dev nodes: 4-way time-slicing
kubectl label node dev-gpu-1 nvidia.com/device-plugin.config=time-slicing-4
# Notebook nodes: 10-way time-slicing
kubectl label node notebook-gpu-1 nvidia.com/device-plugin.config=time-slicing-10Automating with Ansible
Manage device plugin configuration across clusters with Ansible:
---
- name: Configure GPU Device Plugin
hosts: localhost
vars:
gpu_nodes:
- name: prod-gpu-1
config: default
- name: dev-gpu-1
config: time-slicing-4
tasks:
- name: Apply device plugin ConfigMap
kubernetes.core.k8s:
state: present
src: manifests/device-plugin-config.yaml
- name: Label nodes with config
kubernetes.core.k8s:
state: patched
kind: Node
name: "{{ item.name }}"
definition:
metadata:
labels:
nvidia.com/device-plugin.config: "{{ item.config }}"
loop: "{{ gpu_nodes }}"Final Thoughts
The device plugin is the scheduling layer of your GPU infrastructure on Kubernetes. Getting the configuration right β time-slicing for dev, MIG for production, health monitoring for reliability β is the difference between a GPU cluster that runs efficiently and one that wastes expensive hardware. The GPU Operator makes this a ConfigMap change rather than a node-by-node manual process.

