Why Check vLLM Versions Inside NIM Containers?
When deploying NVIDIA NIM containers like nvcr.io/nim/deepseek-ai/deepseek-r1, you need to know the exact vLLM version for:
- Compatibility β matching vLLM features (speculative decoding, chunked prefill) to your workload
- Bug tracking β knowing which vLLM release to check for known issues
- Custom model support β understanding which model architectures are supported
- Security audits β cataloging all package versions for CVE scanning
Inspect NIM Container Contents
Method 1: Run the Container and Check
# Pull the NIM container
docker pull nvcr.io/nim/deepseek-ai/deepseek-r1:latest
# Check vLLM version
docker run --rm --entrypoint python3 \
nvcr.io/nim/deepseek-ai/deepseek-r1:latest \
-c "import vllm; print(f'vLLM: {vllm.__version__}')"Method 2: Full Package Inventory
docker run --rm --entrypoint bash \
nvcr.io/nim/deepseek-ai/deepseek-r1:latest \
-c "pip list 2>/dev/null | grep -E 'vllm|torch|cuda|nvidia|transformers|triton'"Expected output (versions vary by NIM release):
vllm 0.8.x
torch 2.6.x+cu124
nvidia-cuda-runtime-cu12 12.4.x
transformers 4.48.x
triton 3.2.xMethod 3: Check Without GPU
If you do not have a GPU on the machine where you are inspecting:
docker run --rm --entrypoint cat \
nvcr.io/nim/deepseek-ai/deepseek-r1:latest \
/opt/nim/etc/nim-release.json 2>/dev/null || \
docker run --rm --entrypoint bash \
nvcr.io/nim/deepseek-ai/deepseek-r1:latest \
-c "pip show vllm 2>/dev/null | head -5"Method 4: Inspect Without Pulling (Skopeo)
# Inspect image labels without downloading
skopeo inspect docker://nvcr.io/nim/deepseek-ai/deepseek-r1:latest | \
jq '.Labels'NIM Container Architecture
Every NIM container follows this stack:
βββββββββββββββββββββββββββββββββββ
β NIM API Layer (OpenAI-compat) β
βββββββββββββββββββββββββββββββββββ€
β NIM Orchestrator β
β (model download, profile pick) β
βββββββββββββββββββββββββββββββββββ€
β vLLM Inference Engine β
β (tensor parallel, KV cache) β
βββββββββββββββββββββββββββββββββββ€
β PyTorch + CUDA β
βββββββββββββββββββββββββββββββββββ€
β NVIDIA Base Image β
β (CUDA toolkit, cuDNN, NCCL) β
βββββββββββββββββββββββββββββββββββThe NIM container is not a thin wrapper around vLLM. It adds:
- Automatic model profile selection based on detected GPU
- Model download and caching from NGC or local storage
- OpenAI-compatible API with
/v1/chat/completionsand/v1/completions - Health checks and readiness probes for Kubernetes
- Metrics endpoint for Prometheus monitoring
Common NIM Containers and Their Contents
DeepSeek-R1
docker pull nvcr.io/nim/deepseek-ai/deepseek-r1:latestKey details:
- Base model: DeepSeek-R1 (671B MoE, 37B active parameters)
- Inference engine: vLLM with MoE expert parallelism
- Tensor parallelism: 8 (requires 8 GPUs for full model)
- Profiles: FP8 and BF16 precision options
- Minimum GPU: 8x H100 80GB or 8x A100 80GB
Qwen3-32B (DGX Spark Profile)
docker pull nvcr.io/nim/qwen/qwen3-32b:latestKey details:
- Optimized profile:
qwen3-32b-dgx-sparkfor DGX Spark systems - Precision: NVFP4 for memory-efficient deployment
- Single GPU: Runs on 1x H100 or A100 with NVFP4 quantization
IBM Granite 3B Code Instruct
docker pull nvcr.io/nim/ibm/granite-3b-code-instruct:latestKey details:
- Small model: 3B parameters, runs on smaller GPUs
- Context length: Check model card (not 2K or 128K variants)
- Use case: Code generation and completion
Check vLLM CLI Options Available
NIM containers expose vLLM configuration through environment variables:
# List all vLLM-related environment variables
docker run --rm --entrypoint env \
nvcr.io/nim/deepseek-ai/deepseek-r1:latest | \
grep -i vllm
# Check available vLLM override options
docker run --rm --entrypoint bash \
nvcr.io/nim/deepseek-ai/deepseek-r1:latest \
-c "python3 -m vllm.entrypoints.openai.api_server --help 2>/dev/null | head -50"Common vLLM overrides in NIM:
# Override max model length
docker run --gpus all \
-e NIM_MAX_MODEL_LEN=8192 \
nvcr.io/nim/deepseek-ai/deepseek-r1:latest
# Override tensor parallelism
docker run --gpus all \
-e NIM_TENSOR_PARALLEL_SIZE=4 \
nvcr.io/nim/deepseek-ai/deepseek-r1:latest
# Enable chunked prefill
docker run --gpus all \
-e NIM_VLLM_ENABLE_CHUNKED_PREFILL=true \
nvcr.io/nim/deepseek-ai/deepseek-r1:latestSecurity Scanning NIM Containers
For enterprise deployment, scan the full container:
# Trivy scan
trivy image nvcr.io/nim/deepseek-ai/deepseek-r1:latest
# Grype scan
grype nvcr.io/nim/deepseek-ai/deepseek-r1:latest
# Generate SBOM
syft nvcr.io/nim/deepseek-ai/deepseek-r1:latest -o spdx-json > nim-deepseek-sbom.jsonTroubleshooting Container Issues
Container exits immediately
# Check logs
docker logs <container_id>
# Common cause: no GPU detected
# Fix: ensure --gpus all is passed
docker run --gpus all nvcr.io/nim/deepseek-ai/deepseek-r1:latestWrong vLLM version for your workload
NIM pins vLLM to a tested version. You cannot upgrade vLLM inside NIM β instead:
- Use the latest NIM tag for newer vLLM versions
- Check NIM release notes for vLLM version per release
- For custom vLLM, use the NIM model-free approach
CUDA version mismatch
# Check CUDA version inside container
docker run --rm --entrypoint nvidia-smi \
nvcr.io/nim/deepseek-ai/deepseek-r1:latest
# Check host CUDA driver
nvidia-smiNIM containers require a host CUDA driver compatible with the containerβs CUDA toolkit version. Generally, a newer host driver supports older container CUDA versions.
Deploying NIM in production? I help teams design GPU infrastructure, optimize model serving, and implement multi-node inference at scale.
Book an AI Infrastructure Assessment β
Related Resources
- NVIDIA NIM Support Matrix: Every Model, GPU, and Profile
- NVIDIA NIM Model Profiles: How to Choose the Right Configuration
- NVIDIA NIM Model-Free and Custom Model Deployment Guide
- NVIDIA NIM Multi-Node Deployment on Kubernetes
- Container Security Scanning with Trivy
Frequently Asked Questions
How do I check the vLLM version inside a NIM container?
Run: docker exec container_id pip show vllm | grep Version. Alternatively, check container logs at startup where vLLM prints its version.
Which vLLM version does NIM use for DeepSeek-R1?
NIM containers for DeepSeek-R1 typically bundle vLLM 0.6.x or newer. Check the NGC catalog page for your specific container tag.