IBM Granite model family
IBM’s Granite models are open-source, enterprise-focused LLMs trained on curated, legally compliant datasets. They are a core part of RHEL AI and IBM watsonx.
Here is the complete context length and specification reference.
Granite 3.x model specifications
| Model | Parameters | Context Length | License | Notes |
|---|---|---|---|---|
| granite-3.2-8b-instruct | 8B | 128,000 tokens | Apache 2.0 | Latest instruct model, thinking mode |
| granite-3.2-8b-instruct-preview | 8B | 128,000 tokens | Apache 2.0 | Preview with enhanced reasoning |
| granite-3.2-3b-instruct | 3B | 128,000 tokens | Apache 2.0 | Compact instruct model |
| granite-3.2-2b-instruct | 2B | 128,000 tokens | Apache 2.0 | Edge deployment target |
| granite-3.2-1b-instruct | 1B | 128,000 tokens | Apache 2.0 | Smallest instruct variant |
| granite-3.1-8b-instruct | 8B | 128,000 tokens | Apache 2.0 | Stable release |
| granite-3.1-8b-base | 8B | 128,000 tokens | Apache 2.0 | Base model for fine-tuning |
| granite-3.1-3b-instruct | 3B | 128,000 tokens | Apache 2.0 | Compact, function calling |
| granite-3.1-2b-instruct | 2B | 128,000 tokens | Apache 2.0 | Edge and mobile |
| granite-3.1-1b-instruct | 1B | 128,000 tokens | Apache 2.0 | Tiny but capable |
| granite-3.0-8b-instruct | 8B | 4,096 tokens | Apache 2.0 | Original release |
| granite-3.0-8b-base | 8B | 4,096 tokens | Apache 2.0 | Original base |
| granite-3.0-3b-instruct | 3B | 4,096 tokens | Apache 2.0 | Original compact |
Granite code models
| Model | Parameters | Context Length | License |
|---|---|---|---|
| granite-3.2-8b-instruct (code mode) | 8B | 128,000 tokens | Apache 2.0 |
| granite-code-34b | 34B | 8,192 tokens | Apache 2.0 |
| granite-code-20b | 20B | 8,192 tokens | Apache 2.0 |
| granite-code-8b | 8B | 4,096 tokens | Apache 2.0 |
| granite-code-3b | 3B | 2,048 tokens | Apache 2.0 |
Key details by model
granite-8b-base context length
The ibm-granite/granite-8b-base (Granite 3.0) has a 4,096 token context length. If you need longer context, upgrade to granite-3.1-8b-base or granite-3.2-8b-instruct which support 128,000 tokens.
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "ibm-granite/granite-3.2-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Check max context length
print(tokenizer.model_max_length) # 128000granite-3b-instruct context length
The ibm-granite/granite-3b-instruct (Granite 3.0) has a 4,096 token context length. The Granite 3.1 and 3.2 versions of the 3B model support 128,000 tokens.
granite-8b-instruct context length
The ibm-granite/granite-8b-instruct (Granite 3.0) has a 4,096 token context length. Upgrade to granite-3.2-8b-instruct for 128,000 token context.
Deploying Granite on NVIDIA NIM
Granite models are available through NVIDIA NIM:
docker run -d --gpus all \
-e NGC_API_KEY=$NGC_API_KEY \
-p 8000:8000 \
nvcr.io/nim/ibm/granite-3.1-8b-instruct:latestNIM profile selection for Granite:
| Model | GPU | Profile | Precision |
|---|---|---|---|
| granite-3.2-8b-instruct | A100 80GB | default | BF16 |
| granite-3.2-8b-instruct | L40S | default | FP8 |
| granite-3.2-8b-instruct | A10G | default | FP8 |
| granite-3.2-3b-instruct | T4 | default | FP16 |
Deploying Granite on RHEL AI
Granite is the default model family for RHEL AI with InstructLab:
# Download Granite teacher model
ilab model download --repository ibm-granite/granite-3.2-8b-instruct
# Serve the model
ilab model serve --model-path models/granite-3.2-8b-instruct
# Fine-tune with InstructLab
ilab data generate --model models/granite-3.2-8b-instruct
ilab model train --model-path models/granite-3.2-8b-instructGranite vs other models
| Metric | Granite 3.2 8B | Llama 3.1 8B | Mistral 7B |
|---|---|---|---|
| Context length | 128K | 128K | 32K |
| License | Apache 2.0 | Llama 3.1 | Apache 2.0 |
| Function calling | Yes | Yes | Yes |
| Code generation | Strong | Strong | Good |
| RAG optimized | Yes | No | No |
| Training data transparency | Full | Partial | Partial |
| Enterprise indemnity (via IBM) | Yes | No | No |
Granite’s differentiator is legal compliance — trained on curated datasets with full provenance, making it the safest choice for regulated industries.