NVIDIA NIM Support Matrix 2026: Models, GPUs & Profiles

Before deploying a NIM model, you need to answer three questions: Does NIM support my model? Does it run on my GPU? Which profile should I use?

This article consolidates the official NIM LLM Support Matrix into a single reference with practical guidance.

NIM 2.x Supported Models

NIM LLM 2.0.x ships with model-specific containers for these models:

Model	Container	Parameters	Precisions	Max TP
GPT-OSS 120B	`openai/gpt-oss-120b`	120B	MXFP4	TP8
GPT-OSS 20B	`openai/gpt-oss-20b`	20B	MXFP4	TP8
Llama 3.1 70B Instruct	`meta/llama-3.1-70b-instruct`	70B	BF16, FP8, NVFP4	TP8
Llama 3.1 8B Instruct	`meta/llama-3.1-8b-instruct`	8B	BF16, FP8, NVFP4	TP1
Llama 3.3 70B Instruct	`meta/llama-3.3-70b-instruct`	70B	BF16, FP8, NVFP4	TP8
Nemotron Super 49B v1.5	`nvidia/llama-3.3-nemotron-super-49b-v1.5`	49B	BF16, FP8, NVFP4	TP8
Nemotron 3 Nano	`nvidia/nemotron-3-nano`	Small	BF16, FP8, NVFP4	TP8
Nemotron 3 Super 120B	`nvidia/nemotron-3-super-120b-a12b`	120B (12B active)	BF16, FP8, NVFP4	TP8
StarCoder2 7B	`bigcode/starcoder2-7b`	7B	BF16	TP2

All models support LoRA adapters at every TP level (except StarCoder2 and some NVFP4 combinations).

Profile Matrix by Model

Llama 3.1 / 3.3 70B (Most Popular)

The workhorse model. Full precision and TP coverage:

Precision	TP1	TP2	TP4	TP8
BF16	✅	✅	✅	✅
BF16 + LoRA	✅	✅	✅	✅
FP8	✅	✅	✅	✅
FP8 + LoRA	✅	✅	✅	✅
NVFP4	✅	✅	✅	✅
NVFP4 + LoRA	✅*	✅	✅	✅

*Llama 3.3 70B: NVFP4+LoRA not available at TP1.

Recommendation: Use vllm-fp8-tp2-pp1 on 2x A100 80GB or H100. Best cost-performance ratio.

GPT-OSS 120B / 20B (OpenAI Open Models)

MXFP4 only — aggressively quantized for efficiency:

Precision	TP1	TP2	TP4	TP8
MXFP4	✅	✅	✅	✅
MXFP4 + LoRA	✅	✅	✅	✅

Nemotron Super 120B (MoE — 12B Active)

This is a Mixture of Experts model with 120B total but only 12B active parameters. Profile availability varies significantly by GPU:

B200/B300/GB200: Full coverage (BF16/FP8/NVFP4, TP1-TP8)
H100/H200: BF16 from TP2, FP8 from TP1, NVFP4 limited
A100 80GB: BF16 from TP4, FP8 from TP2
L40S: FP8 TP8 only, NVFP4 TP4+

Llama 3.1 8B Instruct

Single-GPU model — no multi-GPU profiles needed:

Precision	TP1
BF16	✅
BF16 + LoRA	✅
FP8	✅
FP8 + LoRA	✅
NVFP4	✅
NVFP4 + LoRA	✅

Verified GPU Compatibility

Which Models Run on My GPU?

GPU	Verified Models
B200	All 9 models
B300 SXM6 AC	All 9 models
GB200	All 9 models
H200	GPT-OSS 120B/20B, Llama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B, StarCoder2
H200 NVL	Llama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B
H100 80GB HBM3	All 9 models
H100 NVL	Llama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B
GH200 144G HBM3e	GPT-OSS 120B/20B, Llama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B
GH200 480GB	GPT-OSS 20B, Llama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano
A100 SXM4 80GB	GPT-OSS 120B/20B, Llama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B
A100 SXM4 40GB	GPT-OSS 120B/20B, Llama 70B/8B, Nemotron Super 49B, Nemotron Nano
A10G	GPT-OSS 20B, Llama 70B/3.3
L40S	GPT-OSS 120B/20B, Llama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B
RTX PRO 6000 Blackwell SE	GPT-OSS 120B/20B, Llama 70B/8B/3.3, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B
RTX PRO 4500 Blackwell SE	GPT-OSS 20B, Nemotron Super 49B, Nemotron Nano, Nemotron Super 120B
GB10	GPT-OSS 20B, Llama 8B, Nemotron Super 49B, Nemotron Nano

Key Observations

Blackwell GPUs (B200, B300, GB200) support every model at every precision — the most versatile option.

H100 80GB remains the production workhorse. Supports all 9 models. FP8 effectively doubles capacity vs BF16.

A100 40GB is limited but functional. Smaller models (8B, 20B) work fine. 70B requires FP8 or NVFP4 quantization.

L40S is the cost-effective inference GPU. Supports most models but larger ones (120B) need TP8 with FP8.

GB10 (DGX Spark) is desktop-class. Only small models (8B, 20B, Nano).

Model-Free NIM

The generic nvidia/model-free-nim container supports any vLLM-compatible model, not just the ones listed above. Explicitly validated models:

GPT-OSS 20B
Apriel Nemotron
Codestral

Verified GPUs for model-free NIM:

A100 (40GB PCIe, 80GB PCIe, 40GB SXM4, 80GB SXM4)
B300 SXM6 AC
GH200 480GB
H100 (80GB HBM3, NVL, PCIe)
H200, H200 NVL
RTX PRO 4500 Blackwell SE

For deployment details, see the Model-Free NIM Guide.

NIM 1.x Legacy Models

These models are supported in NIM LLM 1.15 and earlier (not yet migrated to 2.x):

Model	Container
DeepSeek-V3.1 Terminus	`deepseek-ai/deepseek-v3.1-terminus`
DeepSeek-V3.2 Exp	`deepseek-ai/deepseek-v32-exp-nim`
GLM-5	`zai-org/glm-5`
MiniMax-M2.5	`minimax-ai/minimax-m25`
Nemotron Nano 9B v2 (DGX Spark)	`nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark`
Qwen3 Coder Next	`qwen/qwen3-coder-next`
Qwen3 Next 80B A3B Instruct	`qwen/qwen3-next-80b-a3b-instruct`
Qwen3 Next 80B A3B Thinking	`qwen/qwen3-next-80b-a3b-thinking`
Qwen3 32B	`qwen/qwen3-32b`
Qwen3 32B (DGX Spark)	`qwen/qwen3-32b-dgx-spark`
Riva Translate 4B v1.1	`nvidia/riva-translate-4b-instruct-v1.1`
Healthcare Text2SQL (8B)	`nvidia/llama-3.1-nemotron-nano-8b-healthcare-text2sql-v1.0`
Healthcare Text2SQL (49B)	`nvidia/llama-3.3-nemotron-super-49b-healthcare-text2sql-v1.0`

For 1.x deployment, refer to the NIM LLM 1.15 supported models documentation.

Quick Decision Guide

What GPU do you have?
├── B200/B300/GB200 → Any model, any precision, any TP
├── H100/H200 80GB → Any model, prefer FP8
├── A100 80GB → Most models, prefer FP8 for 70B+
├── A100 40GB → 8B-20B models only (or FP8/NVFP4 for 70B)
├── L40S → Most models, FP8 recommended, large models need TP8
├── A10G → 20B and 70B only
└── GB10 → 8B, 20B, Nano only

What model do you need?
├── General purpose → Llama 3.3 70B (FP8)
├── Code generation → StarCoder2 7B or model-free with Codestral
├── OpenAI compatible → GPT-OSS 20B/120B (MXFP4)
├── NVIDIA optimized → Nemotron Super 49B or 120B
├── Small/edge → Llama 8B or Nemotron Nano
└── Custom/fine-tuned → Model-free NIM

About the Author

I am Luca Berton, AI and Cloud Advisor. I help enterprises select the right GPU and model configuration for their inference workloads. Book a consultation.

Frequently Asked Questions

Which GPUs are supported by NVIDIA NIM?

NVIDIA NIM supports A100 (40/80GB), H100, H200, L40S, L4, and A10G GPUs. Model availability varies by GPU memory.

Can I run NIM on consumer GPUs like RTX 4090?

NIM is designed for data center GPUs. Consumer GPUs are not officially supported, though smaller models may work with vLLM directly.

NVIDIA NIM Support Matrix: Every Model × GPU × Profile

NIM 2.x Supported Models