What AI and cloud consulting services does Luca Berton offer?

Luca Berton provides expert consulting in AI/ML platform strategy, multi-tenant GPU orchestration on OpenShift AI, MLOps enablement, cloud infrastructure design, Kubernetes workshops, and Ansible & Python training.

What is Ansible Pilot?

Ansible Pilot is the leading resource for Ansible automation learning, featuring a YouTube channel with 6.1K subscribers and 1M+ views, plus AnsiblePilot.com with 648K total users.

How can I book a consultation with Luca Berton?

Schedule a free consultation through Calendly at calendly.com/lucaberton or visit lucaberton.com/contact.

GPU Sharing on Kubernetes: MIG, MPS, and Time-Slicing Compared

Luca Berton • Thu Feb 26 2026 • 1 min read •

#gpu#kubernetes#nvidia#mig#ai-infrastructure

\n## 🎮 Stop Wasting Your GPUs

A single NVIDIA A100 costs $10K+ and most organizations use them at 30% utilization. GPU sharing lets multiple workloads share a single GPU, dramatically improving efficiency.

The Three Approaches

1. Time-Slicing

The simplest approach — multiple pods take turns using the full GPU:

# NVIDIA device plugin config
apiVersion: v1
kind: ConfigMap
metadata:
  name: nvidia-device-plugin
  namespace: kube-system
data:
  config.yaml: |
    version: v1
    sharing:
      timeSlicing:
        renameByDefault: false
        resources:
        - name: nvidia.com/gpu
          replicas: 4  # 4 pods share each GPU

Pros: Simple, no special hardware. Cons: No memory isolation — one pod can OOM-kill others. No performance guarantees.

2. Multi-Instance GPU (MIG)

Hardware-level partitioning on A100/A30/H100:

# Enable MIG mode
nvidia-smi -i 0 -mig 1

# Create GPU instances (A100 80GB example)
nvidia-smi mig -cgi 9,9,9,9,9,9,9 -i 0  # 7x 10GB instances

# Create compute instances
nvidia-smi mig -cci -gi 0,1,2,3,4,5,6 -i 0

Kubernetes sees each MIG instance as a separate resource:

resources:
  limits:
    nvidia.com/mig-1g.10gb: 1  # Request one MIG slice

Pros: Hardware isolation (memory + compute). Guaranteed performance. Cons: Only A100/A30/H100. Fixed partitioning — can’t change without stopping workloads.

3. Multi-Process Service (MPS)

CUDA-level sharing with better concurrency than time-slicing:

# Enable MPS in device plugin
sharing:
  mps:
    renameByDefault: false
    resources:
    - name: nvidia.com/gpu
      replicas: 10
      memoryLimit: 8Gi  # Per-client memory limit

Pros: Better utilization than time-slicing, some memory isolation. Cons: Less isolation than MIG. Shared failure domain — one bad CUDA kernel affects all.

When to Use What

Scenario	Recommendation
Development/testing	Time-slicing (simple, flexible)
Production inference (A100/H100)	MIG (hardware isolation)
Production inference (T4/V100)	MPS (best available)
Training	Don’t share — training needs full GPU
Multi-tenant	MIG only (isolation requirement)

NVIDIA GPU Operator Setup

helm install gpu-operator nvidia/gpu-operator \
  --namespace gpu-operator --create-namespace \
  --set driver.enabled=true \
  --set mig.strategy=mixed \
  --set devicePlugin.config.name=nvidia-device-plugin

Monitoring GPU Utilization

# Prometheus rules for GPU waste detection
groups:
- name: gpu-efficiency
  rules:
  - alert: GPUUnderutilized
    expr: avg_over_time(DCGM_FI_DEV_GPU_UTIL[1h]) < 20
    for: 4h
    labels:
      severity: warning
    annotations:
      summary: "GPU {{ $labels.gpu }} utilized at {{ $value }}% for 4+ hours"

Key Takeaways

Audit first — measure current GPU utilization before choosing a sharing strategy
MIG for production — the only option with true hardware isolation
Time-slicing for dev — simple and good enough for non-production
Never share training GPUs — training needs dedicated, uninterrupted access
Monitor constantly — GPU utilization should be a dashboard metric

Need to optimize your GPU infrastructure? I help organizations maximize GPU utilization on Kubernetes. Let’s connect.\n

📌 Need expert help with this topic?

🧠

AI Integration & GPU Platforms

Need help deploying AI/ML platforms? Get expert consulting on OpenShift AI, GPU orchestration, and MLOps.

☸️

Kubernetes & Containerization

Master Kubernetes and container orchestration with hands-on workshops and architecture consulting.

Book a free consultation →

Luca Berton

AI & Cloud Advisor with 18+ years experience. Author of 8 technical books, creator of Ansible Pilot, and instructor at CopyPasteLearn Academy. Speaker at KubeCon EU & Red Hat Summit 2026.

LinkedIn Bluesky YouTube Contact →

← Back to Blog

JSON vs TOON for AI Input: Token-Efficient Data for LLMs

Compare JSON and TOON (Token-Oriented Object Notation) for feeding structured data to Large Language Models. See how TOON cuts token counts by up to 50 percent while keeping JSON compatibility.

Tue Mar 03 2026

Building Custom AI Skills with InstructLab Taxonomy

Create domain-specific AI capabilities using InstructLab's taxonomy system—from writing skill definitions to generating synthetic training data and validating fine-tuned models.

Mon Mar 02 2026

Accessing the OpenClaw Control UI Dashboard on Azure

How to access the OpenClaw Control UI dashboard from an Azure VM — via SSH tunnel (secure) or public IP. Covers device pairing, dashboard authentication, and the browser-based management interface.

Thu Feb 26 2026