Skip to main content
🎓 Claude Code Masterclass Learn AI-assisted development on Udemy — plus the companion book on Leanpub & Amazon. Start Learning
NVIDIA NIM Support Matrix Models GPUs Profiles 2026
AI

NVIDIA NIM Support Matrix: Every Model × GPU × Profile

The complete NIM LLM support matrix. Which models run on which GPUs, available precision profiles (BF16, FP8, NVFP4, MXFP4), TP configurations, LoRA.

LB
Luca Berton
· 5 min read

Before deploying a NIM model, you need to answer three questions: Does NIM support my model? Does it run on my GPU? Which profile should I use?

This article consolidates the official NIM LLM Support Matrix into a single reference with practical guidance.

NIM 2.x Supported Models

NIM LLM 2.0.x ships with model-specific containers for these models:

ModelContainerParametersPrecisionsMax TP
GPT-OSS 120Bopenai/gpt-oss-120b120BMXFP4TP8
GPT-OSS 20Bopenai/gpt-oss-20b20BMXFP4TP8
Llama 3.1 70B Instructmeta/llama-3.1-70b-instruct70BBF16, FP8, NVFP4TP8
Llama 3.1 8B Instructmeta/llama-3.1-8b-instruct8BBF16, FP8, NVFP4TP1
Llama 3.3 70B Instructmeta/llama-3.3-70b-instruct70BBF16, FP8, NVFP4TP8
Nemotron Super 49B v1.5nvidia/llama-3.3-nemotron-super-49b-v1.549BBF16, FP8, NVFP4TP8
Nemotron 3 Nanonvidia/nemotron-3-nanoSmallBF16, FP8, NVFP4TP8
Nemotron 3 Super 120Bnvidia/nemotron-3-super-120b-a12b120B (12B active)BF16, FP8, NVFP4TP8
StarCoder2 7Bbigcode/starcoder2-7b7BBF16TP2

All models support LoRA adapters at every TP level (except StarCoder2 and some NVFP4 combinations).

Profile Matrix by Model

The workhorse model. Full precision and TP coverage:

PrecisionTP1TP2TP4TP8
BF16
BF16 + LoRA
FP8
FP8 + LoRA
NVFP4
NVFP4 + LoRA✅*

*Llama 3.3 70B: NVFP4+LoRA not available at TP1.

Recommendation: Use vllm-fp8-tp2-pp1 on 2x A100 80GB or H100. Best cost-performance ratio.

GPT-OSS 120B / 20B (OpenAI Open Models)

MXFP4 only — aggressively quantized for efficiency:

PrecisionTP1TP2TP4TP8
MXFP4
MXFP4 + LoRA

Nemotron Super 120B (MoE — 12B Active)

This is a Mixture of Experts model with 120B total but only 12B active parameters. Profile availability varies significantly by GPU:

  • B200/B300/GB200: Full coverage (BF16/FP8/NVFP4, TP1-TP8)
  • H100/H200: BF16 from TP2, FP8 from TP1, NVFP4 limited
  • A100 80GB: BF16 from TP4, FP8 from TP2
  • L40S: FP8 TP8 only, NVFP4 TP4+

Llama 3.1 8B Instruct

Single-GPU model — no multi-GPU profiles needed:

PrecisionTP1
BF16
BF16 + LoRA
FP8
FP8 + LoRA
NVFP4
NVFP4 + LoRA

Verified GPU Compatibility

Which Models Run on My GPU?

GPUVerified Models
B200All 9 models
B300 SXM6 ACAll 9 models
GB200All 9 models
H200GPT-OSS 120B/20B, Llama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B, StarCoder2
H200 NVLLlama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B
H100 80GB HBM3All 9 models
H100 NVLLlama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B
GH200 144G HBM3eGPT-OSS 120B/20B, Llama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B
GH200 480GBGPT-OSS 20B, Llama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano
A100 SXM4 80GBGPT-OSS 120B/20B, Llama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B
A100 SXM4 40GBGPT-OSS 120B/20B, Llama 70B/8B, Nemotron Super 49B, Nemotron Nano
A10GGPT-OSS 20B, Llama 70B/3.3
L40SGPT-OSS 120B/20B, Llama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B
RTX PRO 6000 Blackwell SEGPT-OSS 120B/20B, Llama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B
RTX PRO 4500 Blackwell SEGPT-OSS 20B, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B
GB10GPT-OSS 20B, Llama 8B, Nemotron Super 49B, Nemotron Nano

Key Observations

Blackwell GPUs (B200, B300, GB200) support every model at every precision — the most versatile option.

H100 80GB remains the production workhorse. Supports all 9 models. FP8 effectively doubles capacity vs BF16.

A100 40GB is limited but functional. Smaller models (8B, 20B) work fine. 70B requires FP8 or NVFP4 quantization.

L40S is the cost-effective inference GPU. Supports most models but larger ones (120B) need TP8 with FP8.

GB10 (DGX Spark) is desktop-class. Only small models (8B, 20B, Nano).

Model-Free NIM

The generic nvidia/model-free-nim container supports any vLLM-compatible model, not just the ones listed above. Explicitly validated models:

  • GPT-OSS 20B
  • Apriel Nemotron
  • Codestral

Verified GPUs for model-free NIM:

  • A100 (40GB PCIe, 80GB PCIe, 40GB SXM4, 80GB SXM4)
  • B300 SXM6 AC
  • GH200 480GB
  • H100 (80GB HBM3, NVL, PCIe)
  • H200, H200 NVL
  • RTX PRO 4500 Blackwell SE

For deployment details, see the Model-Free NIM Guide.

NIM 1.x Legacy Models

These models are supported in NIM LLM 1.15 and earlier (not yet migrated to 2.x):

ModelContainer
DeepSeek-V3.1 Terminusdeepseek-ai/deepseek-v3.1-terminus
DeepSeek-V3.2 Expdeepseek-ai/deepseek-v32-exp-nim
GLM-5zai-org/glm-5
MiniMax-M2.5minimax-ai/minimax-m25
Nemotron Nano 9B v2 (DGX Spark)nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark
Qwen3 Coder Nextqwen/qwen3-coder-next
Qwen3 Next 80B A3B Instructqwen/qwen3-next-80b-a3b-instruct
Qwen3 Next 80B A3B Thinkingqwen/qwen3-next-80b-a3b-thinking
Qwen3 32Bqwen/qwen3-32b
Qwen3 32B (DGX Spark)qwen/qwen3-32b-dgx-spark
Riva Translate 4B v1.1nvidia/riva-translate-4b-instruct-v1.1
Healthcare Text2SQL (8B)nvidia/llama-3.1-nemotron-nano-8b-healthcare-text2sql-v1.0
Healthcare Text2SQL (49B)nvidia/llama-3.3-nemotron-super-49b-healthcare-text2sql-v1.0

For 1.x deployment, refer to the NIM LLM 1.15 supported models documentation.

Quick Decision Guide

What GPU do you have?
├── B200/B300/GB200 → Any model, any precision, any TP
├── H100/H200 80GB → Any model, prefer FP8
├── A100 80GB → Most models, prefer FP8 for 70B+
├── A100 40GB → 8B-20B models only (or FP8/NVFP4 for 70B)
├── L40S → Most models, FP8 recommended, large models need TP8
├── A10G → 20B and 70B only
└── GB10 → 8B, 20B, Nano only

What model do you need?
├── General purpose → Llama 3.3 70B (FP8)
├── Code generation → StarCoder2 7B or model-free with Codestral
├── OpenAI compatible → GPT-OSS 20B/120B (MXFP4)
├── NVIDIA optimized → Nemotron Super 49B or 120B
├── Small/edge → Llama 8B or Nemotron Nano
└── Custom/fine-tuned → Model-free NIM

About the Author

I am Luca Berton, AI and Cloud Advisor. I help enterprises select the right GPU and model configuration for their inference workloads. Book a consultation.

Frequently Asked Questions

Which GPUs are supported by NVIDIA NIM?

NVIDIA NIM supports A100 (40/80GB), H100, H200, L40S, L4, and A10G GPUs. Model availability varies by GPU memory.

Can I run NIM on consumer GPUs like RTX 4090?

NIM is designed for data center GPUs. Consumer GPUs are not officially supported, though smaller models may work with vLLM directly.

Free 30-min AI & Cloud consultation

Book Now