IBM Granite Context Length and Model Specs (2026)

IBM Granite model family

IBM’s Granite models are open-source, enterprise-focused LLMs trained on curated, legally compliant datasets. They are a core part of RHEL AI and IBM watsonx.

Here is the complete context length and specification reference.

Granite 3.x model specifications

Model	Parameters	Context Length	License	Notes
granite-3.2-8b-instruct	8B	128,000 tokens	Apache 2.0	Latest instruct model, thinking mode
granite-3.2-8b-instruct-preview	8B	128,000 tokens	Apache 2.0	Preview with enhanced reasoning
granite-3.2-3b-instruct	3B	128,000 tokens	Apache 2.0	Compact instruct model
granite-3.2-2b-instruct	2B	128,000 tokens	Apache 2.0	Edge deployment target
granite-3.2-1b-instruct	1B	128,000 tokens	Apache 2.0	Smallest instruct variant
granite-3.1-8b-instruct	8B	128,000 tokens	Apache 2.0	Stable release
granite-3.1-8b-base	8B	128,000 tokens	Apache 2.0	Base model for fine-tuning
granite-3.1-3b-instruct	3B	128,000 tokens	Apache 2.0	Compact, function calling
granite-3.1-2b-instruct	2B	128,000 tokens	Apache 2.0	Edge and mobile
granite-3.1-1b-instruct	1B	128,000 tokens	Apache 2.0	Tiny but capable
granite-3.0-8b-instruct	8B	4,096 tokens	Apache 2.0	Original release
granite-3.0-8b-base	8B	4,096 tokens	Apache 2.0	Original base
granite-3.0-3b-instruct	3B	4,096 tokens	Apache 2.0	Original compact

Granite code models

Model	Parameters	Context Length	License
granite-3.2-8b-instruct (code mode)	8B	128,000 tokens	Apache 2.0
granite-code-34b	34B	8,192 tokens	Apache 2.0
granite-code-20b	20B	8,192 tokens	Apache 2.0
granite-code-8b	8B	4,096 tokens	Apache 2.0
granite-code-3b	3B	2,048 tokens	Apache 2.0

Key details by model

granite-8b-base context length

The ibm-granite/granite-8b-base (Granite 3.0) has a 4,096 token context length. If you need longer context, upgrade to granite-3.1-8b-base or granite-3.2-8b-instruct which support 128,000 tokens.

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "ibm-granite/granite-3.2-8b-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Check max context length
print(tokenizer.model_max_length)  # 128000

granite-3b-instruct context length

The ibm-granite/granite-3b-instruct (Granite 3.0) has a 4,096 token context length. The Granite 3.1 and 3.2 versions of the 3B model support 128,000 tokens.

granite-8b-instruct context length

The ibm-granite/granite-8b-instruct (Granite 3.0) has a 4,096 token context length. Upgrade to granite-3.2-8b-instruct for 128,000 token context.

Deploying Granite on NVIDIA NIM

Granite models are available through NVIDIA NIM:

docker run -d --gpus all \
  -e NGC_API_KEY=$NGC_API_KEY \
  -p 8000:8000 \
  nvcr.io/nim/ibm/granite-3.1-8b-instruct:latest

NIM profile selection for Granite:

Model	GPU	Profile	Precision
granite-3.2-8b-instruct	A100 80GB	default	BF16
granite-3.2-8b-instruct	L40S	default	FP8
granite-3.2-8b-instruct	A10G	default	FP8
granite-3.2-3b-instruct	T4	default	FP16

Deploying Granite on RHEL AI

Granite is the default model family for RHEL AI with InstructLab:

# Download Granite teacher model
ilab model download --repository ibm-granite/granite-3.2-8b-instruct

# Serve the model
ilab model serve --model-path models/granite-3.2-8b-instruct

# Fine-tune with InstructLab
ilab data generate --model models/granite-3.2-8b-instruct
ilab model train --model-path models/granite-3.2-8b-instruct

Granite vs other models

Metric	Granite 3.2 8B	Llama 3.1 8B	Mistral 7B
Context length	128K	128K	32K
License	Apache 2.0	Llama 3.1	Apache 2.0
Function calling	Yes	Yes	Yes
Code generation	Strong	Strong	Good
RAG optimized	Yes	No	No
Training data transparency	Full	Partial	Partial
Enterprise indemnity (via IBM)	Yes	No	No

Granite’s differentiator is legal compliance — trained on curated datasets with full provenance, making it the safest choice for regulated industries.