Skip to main content
πŸŽ“ Claude Code Masterclass Learn AI-assisted development on Udemy β€” plus the companion book on Leanpub & Amazon. Start Learning
NVIDIA NIM container vLLM versions and package contents
AI

Which vLLM Version Is Inside Your NIM Container? (2026)

What vLLM version ships inside NVIDIA NIM containers like nvcr.io/nim/deepseek-ai/deepseek-r1? Here is how to inspect NIM container contents, check vLLM.

LB
Luca Berton
Β· 3 min read

Why Check vLLM Versions Inside NIM Containers?

When deploying NVIDIA NIM containers like nvcr.io/nim/deepseek-ai/deepseek-r1, you need to know the exact vLLM version for:

  • Compatibility β€” matching vLLM features (speculative decoding, chunked prefill) to your workload
  • Bug tracking β€” knowing which vLLM release to check for known issues
  • Custom model support β€” understanding which model architectures are supported
  • Security audits β€” cataloging all package versions for CVE scanning

Inspect NIM Container Contents

Method 1: Run the Container and Check

# Pull the NIM container
docker pull nvcr.io/nim/deepseek-ai/deepseek-r1:latest

# Check vLLM version
docker run --rm --entrypoint python3 \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest \
  -c "import vllm; print(f'vLLM: {vllm.__version__}')"

Method 2: Full Package Inventory

docker run --rm --entrypoint bash \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest \
  -c "pip list 2>/dev/null | grep -E 'vllm|torch|cuda|nvidia|transformers|triton'"

Expected output (versions vary by NIM release):

vllm                     0.8.x
torch                    2.6.x+cu124
nvidia-cuda-runtime-cu12 12.4.x
transformers             4.48.x
triton                   3.2.x

Method 3: Check Without GPU

If you do not have a GPU on the machine where you are inspecting:

docker run --rm --entrypoint cat \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest \
  /opt/nim/etc/nim-release.json 2>/dev/null || \
docker run --rm --entrypoint bash \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest \
  -c "pip show vllm 2>/dev/null | head -5"

Method 4: Inspect Without Pulling (Skopeo)

# Inspect image labels without downloading
skopeo inspect docker://nvcr.io/nim/deepseek-ai/deepseek-r1:latest | \
  jq '.Labels'

NIM Container Architecture

Every NIM container follows this stack:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  NIM API Layer (OpenAI-compat)  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  NIM Orchestrator               β”‚
β”‚  (model download, profile pick) β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  vLLM Inference Engine          β”‚
β”‚  (tensor parallel, KV cache)    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  PyTorch + CUDA                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  NVIDIA Base Image              β”‚
β”‚  (CUDA toolkit, cuDNN, NCCL)    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

The NIM container is not a thin wrapper around vLLM. It adds:

  • Automatic model profile selection based on detected GPU
  • Model download and caching from NGC or local storage
  • OpenAI-compatible API with /v1/chat/completions and /v1/completions
  • Health checks and readiness probes for Kubernetes
  • Metrics endpoint for Prometheus monitoring

Common NIM Containers and Their Contents

DeepSeek-R1

docker pull nvcr.io/nim/deepseek-ai/deepseek-r1:latest

Key details:

  • Base model: DeepSeek-R1 (671B MoE, 37B active parameters)
  • Inference engine: vLLM with MoE expert parallelism
  • Tensor parallelism: 8 (requires 8 GPUs for full model)
  • Profiles: FP8 and BF16 precision options
  • Minimum GPU: 8x H100 80GB or 8x A100 80GB

Qwen3-32B (DGX Spark Profile)

docker pull nvcr.io/nim/qwen/qwen3-32b:latest

Key details:

  • Optimized profile: qwen3-32b-dgx-spark for DGX Spark systems
  • Precision: NVFP4 for memory-efficient deployment
  • Single GPU: Runs on 1x H100 or A100 with NVFP4 quantization

IBM Granite 3B Code Instruct

docker pull nvcr.io/nim/ibm/granite-3b-code-instruct:latest

Key details:

  • Small model: 3B parameters, runs on smaller GPUs
  • Context length: Check model card (not 2K or 128K variants)
  • Use case: Code generation and completion

Check vLLM CLI Options Available

NIM containers expose vLLM configuration through environment variables:

# List all vLLM-related environment variables
docker run --rm --entrypoint env \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest | \
  grep -i vllm

# Check available vLLM override options
docker run --rm --entrypoint bash \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest \
  -c "python3 -m vllm.entrypoints.openai.api_server --help 2>/dev/null | head -50"

Common vLLM overrides in NIM:

# Override max model length
docker run --gpus all \
  -e NIM_MAX_MODEL_LEN=8192 \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest

# Override tensor parallelism
docker run --gpus all \
  -e NIM_TENSOR_PARALLEL_SIZE=4 \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest

# Enable chunked prefill
docker run --gpus all \
  -e NIM_VLLM_ENABLE_CHUNKED_PREFILL=true \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest

Security Scanning NIM Containers

For enterprise deployment, scan the full container:

# Trivy scan
trivy image nvcr.io/nim/deepseek-ai/deepseek-r1:latest

# Grype scan
grype nvcr.io/nim/deepseek-ai/deepseek-r1:latest

# Generate SBOM
syft nvcr.io/nim/deepseek-ai/deepseek-r1:latest -o spdx-json > nim-deepseek-sbom.json

Troubleshooting Container Issues

Container exits immediately

# Check logs
docker logs <container_id>

# Common cause: no GPU detected
# Fix: ensure --gpus all is passed
docker run --gpus all nvcr.io/nim/deepseek-ai/deepseek-r1:latest

Wrong vLLM version for your workload

NIM pins vLLM to a tested version. You cannot upgrade vLLM inside NIM β€” instead:

  1. Use the latest NIM tag for newer vLLM versions
  2. Check NIM release notes for vLLM version per release
  3. For custom vLLM, use the NIM model-free approach

CUDA version mismatch

# Check CUDA version inside container
docker run --rm --entrypoint nvidia-smi \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest

# Check host CUDA driver
nvidia-smi

NIM containers require a host CUDA driver compatible with the container’s CUDA toolkit version. Generally, a newer host driver supports older container CUDA versions.


Deploying NIM in production? I help teams design GPU infrastructure, optimize model serving, and implement multi-node inference at scale.

Book an AI Infrastructure Assessment β†’

Frequently Asked Questions

How do I check the vLLM version inside a NIM container?

Run: docker exec container_id pip show vllm | grep Version. Alternatively, check container logs at startup where vLLM prints its version.

Which vLLM version does NIM use for DeepSeek-R1?

NIM containers for DeepSeek-R1 typically bundle vLLM 0.6.x or newer. Check the NGC catalog page for your specific container tag.

Free 30-min AI & Cloud consultation

Book Now