Which vLLM Version Is Inside Your NIM Container? (2026)

Why Check vLLM Versions Inside NIM Containers?

When deploying NVIDIA NIM containers like nvcr.io/nim/deepseek-ai/deepseek-r1, you need to know the exact vLLM version for:

Compatibility — matching vLLM features (speculative decoding, chunked prefill) to your workload
Bug tracking — knowing which vLLM release to check for known issues
Custom model support — understanding which model architectures are supported
Security audits — cataloging all package versions for CVE scanning

Inspect NIM Container Contents

Method 1: Run the Container and Check

# Pull the NIM container
docker pull nvcr.io/nim/deepseek-ai/deepseek-r1:latest

# Check vLLM version
docker run --rm --entrypoint python3 \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest \
  -c "import vllm; print(f'vLLM: {vllm.__version__}')"

Method 2: Full Package Inventory

docker run --rm --entrypoint bash \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest \
  -c "pip list 2>/dev/null | grep -E 'vllm|torch|cuda|nvidia|transformers|triton'"

Expected output (versions vary by NIM release):

vllm                     0.8.x
torch                    2.6.x+cu124
nvidia-cuda-runtime-cu12 12.4.x
transformers             4.48.x
triton                   3.2.x

Method 3: Check Without GPU

If you do not have a GPU on the machine where you are inspecting:

docker run --rm --entrypoint cat \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest \
  /opt/nim/etc/nim-release.json 2>/dev/null || \
docker run --rm --entrypoint bash \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest \
  -c "pip show vllm 2>/dev/null | head -5"

Method 4: Inspect Without Pulling (Skopeo)

# Inspect image labels without downloading
skopeo inspect docker://nvcr.io/nim/deepseek-ai/deepseek-r1:latest | \
  jq '.Labels'

NIM Container Architecture

Every NIM container follows this stack:

┌─────────────────────────────────┐
│  NIM API Layer (OpenAI-compat)  │
├─────────────────────────────────┤
│  NIM Orchestrator               │
│  (model download, profile pick) │
├─────────────────────────────────┤
│  vLLM Inference Engine          │
│  (tensor parallel, KV cache)    │
├─────────────────────────────────┤
│  PyTorch + CUDA                 │
├─────────────────────────────────┤
│  NVIDIA Base Image              │
│  (CUDA toolkit, cuDNN, NCCL)    │
└─────────────────────────────────┘

The NIM container is not a thin wrapper around vLLM. It adds:

Automatic model profile selection based on detected GPU
Model download and caching from NGC or local storage
OpenAI-compatible API with /v1/chat/completions and /v1/completions
Health checks and readiness probes for Kubernetes
Metrics endpoint for Prometheus monitoring

Common NIM Containers and Their Contents

DeepSeek-R1

docker pull nvcr.io/nim/deepseek-ai/deepseek-r1:latest

Key details:

Base model: DeepSeek-R1 (671B MoE, 37B active parameters)
Inference engine: vLLM with MoE expert parallelism
Tensor parallelism: 8 (requires 8 GPUs for full model)
Profiles: FP8 and BF16 precision options
Minimum GPU: 8x H100 80GB or 8x A100 80GB

Qwen3-32B (DGX Spark Profile)

docker pull nvcr.io/nim/qwen/qwen3-32b:latest

Key details:

Optimized profile: qwen3-32b-dgx-spark for DGX Spark systems
Precision: NVFP4 for memory-efficient deployment
Single GPU: Runs on 1x H100 or A100 with NVFP4 quantization

IBM Granite 3B Code Instruct

docker pull nvcr.io/nim/ibm/granite-3b-code-instruct:latest

Key details:

Small model: 3B parameters, runs on smaller GPUs
Context length: Check model card (not 2K or 128K variants)
Use case: Code generation and completion

Check vLLM CLI Options Available

NIM containers expose vLLM configuration through environment variables:

# List all vLLM-related environment variables
docker run --rm --entrypoint env \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest | \
  grep -i vllm

# Check available vLLM override options
docker run --rm --entrypoint bash \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest \
  -c "python3 -m vllm.entrypoints.openai.api_server --help 2>/dev/null | head -50"

Common vLLM overrides in NIM:

# Override max model length
docker run --gpus all \
  -e NIM_MAX_MODEL_LEN=8192 \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest

# Override tensor parallelism
docker run --gpus all \
  -e NIM_TENSOR_PARALLEL_SIZE=4 \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest

# Enable chunked prefill
docker run --gpus all \
  -e NIM_VLLM_ENABLE_CHUNKED_PREFILL=true \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest

Security Scanning NIM Containers

For enterprise deployment, scan the full container:

# Trivy scan
trivy image nvcr.io/nim/deepseek-ai/deepseek-r1:latest

# Grype scan
grype nvcr.io/nim/deepseek-ai/deepseek-r1:latest

# Generate SBOM
syft nvcr.io/nim/deepseek-ai/deepseek-r1:latest -o spdx-json > nim-deepseek-sbom.json

Troubleshooting Container Issues

Container exits immediately

# Check logs
docker logs <container_id>

# Common cause: no GPU detected
# Fix: ensure --gpus all is passed
docker run --gpus all nvcr.io/nim/deepseek-ai/deepseek-r1:latest

Wrong vLLM version for your workload

NIM pins vLLM to a tested version. You cannot upgrade vLLM inside NIM — instead:

Use the latest NIM tag for newer vLLM versions
Check NIM release notes for vLLM version per release
For custom vLLM, use the NIM model-free approach

CUDA version mismatch

# Check CUDA version inside container
docker run --rm --entrypoint nvidia-smi \
  nvcr.io/nim/deepseek-ai/deepseek-r1:latest

# Check host CUDA driver
nvidia-smi

NIM containers require a host CUDA driver compatible with the container’s CUDA toolkit version. Generally, a newer host driver supports older container CUDA versions.

Deploying NIM in production? I help teams design GPU infrastructure, optimize model serving, and implement multi-node inference at scale.

Book an AI Infrastructure Assessment →

Frequently Asked Questions

How do I check the vLLM version inside a NIM container?

Run: docker exec container_id pip show vllm | grep Version. Alternatively, check container logs at startup where vLLM prints its version.

Which vLLM version does NIM use for DeepSeek-R1?

NIM containers for DeepSeek-R1 typically bundle vLLM 0.6.x or newer. Check the NGC catalog page for your specific container tag.

Which vLLM Version Is Inside Your NIM Container? (2026)

Why Check vLLM Versions Inside NIM Containers?

Inspect NIM Container Contents

Method 1: Run the Container and Check

Method 2: Full Package Inventory

Method 3: Check Without GPU

Method 4: Inspect Without Pulling (Skopeo)

NIM Container Architecture

Common NIM Containers and Their Contents

DeepSeek-R1

Qwen3-32B (DGX Spark Profile)

IBM Granite 3B Code Instruct

Check vLLM CLI Options Available

Security Scanning NIM Containers

Troubleshooting Container Issues

Container exits immediately

Wrong vLLM version for your workload

CUDA version mismatch

Frequently Asked Questions

How do I check the vLLM version inside a NIM container?

Which vLLM version does NIM use for DeepSeek-R1?

Related Articles

Differential Privacy: How Math Protects Your Privacy

GLM-5.2 744B: Sparse Attention Meets Efficient MoE

Reliable AI Agents in Java with LangChain4J — Workshop

AI Gateway on Kubernetes: Route and Load-Balance LLM Traffic

Why Check vLLM Versions Inside NIM Containers?

Inspect NIM Container Contents

Method 1: Run the Container and Check

Method 2: Full Package Inventory

Method 3: Check Without GPU

Method 4: Inspect Without Pulling (Skopeo)

NIM Container Architecture

Common NIM Containers and Their Contents

DeepSeek-R1

Qwen3-32B (DGX Spark Profile)

IBM Granite 3B Code Instruct

Check vLLM CLI Options Available

Security Scanning NIM Containers

Troubleshooting Container Issues

Container exits immediately

Wrong vLLM version for your workload

CUDA version mismatch

Related Resources

Frequently Asked Questions

How do I check the vLLM version inside a NIM container?

Which vLLM version does NIM use for DeepSeek-R1?

Related Articles

Differential Privacy: How Math Protects Your Privacy

GLM-5.2 744B: Sparse Attention Meets Efficient MoE

Reliable AI Agents in Java with LangChain4J — Workshop

AI Gateway on Kubernetes: Route and Load-Balance LLM Traffic