Anthropic’s Claude pricing now has a clear ladder: Fable for long-running agents, Opus for complex agentic coding and enterprise work, Sonnet for balanced production workloads, and Haiku for fast low-cost execution.
The important point is that Claude cost is not just “input price plus output price.” Prompt caching, batch processing, web search, code execution, managed agent runtime, service tier, and data residency can all change the real bill.
Below is the practical breakdown for teams planning API usage in 2026.
Claude API Pricing Table
Current Claude API token pricing:
| Model | Best fit | Input | Output | Cache write | Cache read |
|---|---|---|---|---|---|
| Fable 5 | Long-running agents | $10 / MTok | $50 / MTok | $12.50 / MTok | $1 / MTok |
| Opus 4.8 | Complex agentic coding and enterprise work | $5 / MTok | $25 / MTok | $6.25 / MTok | $0.50 / MTok |
| Sonnet 4.6 | Balanced intelligence, cost, and speed | $3 / MTok | $15 / MTok | $3.75 / MTok | $0.30 / MTok |
| Haiku 4.5 | Fastest, most cost-efficient model | $1 / MTok | $5 / MTok | $1.25 / MTok | $0.10 / MTok |
MTok means one million tokens.
The price spread matters. Fable 5 output is 10x the cost of Haiku 4.5 output. Sonnet 4.6 output is 40% cheaper than Opus 4.8 output. For high-volume applications, model routing is not a minor optimization. It is the main cost control.
Model Selection: What Each Claude Tier Is For
Fable 5
Fable 5 is positioned for next-generation intelligence in long-running agents. That makes it relevant when the workload is not a single prompt-response exchange but a sustained process: planning, tool use, memory, file inspection, retries, and multi-step execution.
Use Fable only where the agent’s success rate justifies the premium. It is the most expensive current tier in the pricing table, especially for output-heavy workflows.
Opus 4.8
Opus 4.8 is the premium choice for complex agentic coding and enterprise work. It is cheaper than Fable 5 but still significantly more expensive than Sonnet and Haiku.
Use Opus for tasks where failure is expensive:
- architecture review
- large refactors
- multi-repository code analysis
- production incident investigation
- complex tool-using agents
- regulated enterprise workflows
The trap is using Opus for every request. If your application sends routine classification, extraction, summarization, or routing tasks to Opus, you are probably overpaying.
Sonnet 4.6
Sonnet 4.6 is the practical default for many production applications. It is positioned as the balance between intelligence, cost, and speed.
Use Sonnet for:
- customer support assistants
- internal copilots
- coding assistants with moderate complexity
- document analysis
- workflow automation
- agent loops where quality matters but every step does not need the top model
For many teams, the right architecture is Sonnet by default, Opus for escalation, and Haiku for simple stages.
Haiku 4.5
Haiku 4.5 is the cost-control model. It is the cheapest current Claude model in the table and should handle simple, high-volume tasks before they reach more expensive models.
Good Haiku workloads include:
- intent detection
- spam or abuse pre-filtering
- title generation
- extraction from short structured text
- first-pass summarization
- routing decisions
- lightweight transformations
Haiku is also useful as a guardrail model in front of Sonnet or Opus. If it can confidently answer or route the request, you avoid paying premium-model prices.
Prompt Caching Changes the Economics
Prompt caching is one of the most important cost levers in Claude API deployments.
The current pricing structure charges a higher rate for writing tokens into the cache, then a much lower rate when those cached tokens are read again. For example:
| Model | Normal input | Cache write | Cache read |
|---|---|---|---|
| Opus 4.8 | $5 / MTok | $6.25 / MTok | $0.50 / MTok |
| Sonnet 4.6 | $3 / MTok | $3.75 / MTok | $0.30 / MTok |
| Haiku 4.5 | $1 / MTok | $1.25 / MTok | $0.10 / MTok |
This is ideal when requests share a stable prefix:
- system prompts
- tool definitions
- repository instructions
- policy documents
- product documentation
- long few-shot examples
- agent operating instructions
If every request repeats the same 40,000-token instruction block, prompt caching can cut a large part of the input bill. If every prompt is unique, caching helps much less.
Batch Processing Can Cut Cost by 50%
Anthropic advertises batch processing as a 50% saving option. The tradeoff is latency: batch is for asynchronous workloads that can wait.
Good candidates:
- offline document summarization
- enrichment jobs
- nightly classification
- eval runs
- synthetic test generation
- log analysis
- dataset labeling
Bad candidates:
- live chat
- interactive coding sessions
- incident response
- real-time user workflows
- agent loops waiting on immediate tool decisions
Batch processing is one of the easiest savings when your workload is not latency-sensitive.
Platform Feature Pricing
Claude Platform features add separate line items beyond token usage.
| Feature | Current listed cost | Practical note |
|---|---|---|
| Managed Agents | $0.08 per active runtime session-hour | Runtime cost sits on top of standard token rates |
| Web search | $10 / 1K searches | Token costs for processing search results still apply |
| Code execution | 50 free hours daily per organization, then $0.05 per container-hour | Useful for analysis, but track long-running containers |
These costs are easy to miss in prototypes because the token bill is more visible. In production, feature pricing should be included in unit economics.
For example, an agent that runs for hours with many tool calls can accumulate both token usage and active runtime session-hours. A research assistant that performs frequent web search can add search costs before the model even processes the retrieved content.
US-Only Inference and Fast Mode
Two multipliers matter for enterprise cost planning:
- US-only inference is listed at 1.1x pricing for input and output tokens.
- Fast mode for Opus 4.8 is listed at 2x standard pricing.
US-only inference may be required for data residency or customer commitments. Fast mode can make sense when latency has direct business value, but it should not be the default for background tasks.
The rule is simple: attach multipliers to specific workloads, not entire environments.
A Practical Claude Cost Architecture
For most production teams, the cleanest design is a routing layer:
| Workload | Suggested default |
|---|---|
| Simple classification, extraction, routing | Haiku 4.5 |
| General assistant and document work | Sonnet 4.6 |
| Complex coding, reasoning, and enterprise review | Opus 4.8 |
| Long-running high-stakes agents | Fable 5 |
| Offline bulk jobs | Batch processing |
| Repeated long prompts | Prompt caching |
The goal is not always to use the cheapest model. The goal is to use the cheapest model that clears the quality bar for that stage of the workflow.
Example Monthly Cost Thinking
Assume a workload uses:
- 500 million input tokens
- 100 million output tokens
- no prompt caching
- no batch processing
At current listed rates:
| Model | Input cost | Output cost | Total |
|---|---|---|---|
| Haiku 4.5 | $500 | $500 | $1,000 |
| Sonnet 4.6 | $1,500 | $1,500 | $3,000 |
| Opus 4.8 | $2,500 | $2,500 | $5,000 |
| Fable 5 | $5,000 | $5,000 | $10,000 |
This simplified example shows why routing matters. The same token volume can be a $1,000 Haiku workload or a $10,000 Fable workload.
Real bills will differ because output ratios, cache hit rates, service tiers, feature usage, and regional settings all change the final number.
How To Reduce Claude API Spend
Start with the highest-leverage controls:
- Route simple tasks to Haiku before using Sonnet, Opus, or Fable.
- Use Sonnet as the default for balanced production workloads.
- Reserve Opus and Fable for tasks where quality or autonomy justifies the premium.
- Cache stable system prompts, tool schemas, and long context blocks.
- Move offline jobs to batch processing.
- Track web search, code execution, and managed agent runtime separately from token usage.
- Measure cost per successful task, not only cost per token.
- Keep fast mode and US-only inference scoped to workloads that require them.
Cost optimization should not make the system worse. The right metric is business outcome per dollar: resolved ticket, completed coding task, reviewed document, classified record, or successful agent run.
Key Takeaways
Claude pricing in 2026 is a tiered architecture problem.
Haiku 4.5 is the cheapest model and should handle simple, high-volume work. Sonnet 4.6 is the balanced default for many production applications. Opus 4.8 is the premium choice for complex coding and enterprise reasoning. Fable 5 is the expensive long-running agent tier.
Prompt caching and batch processing are not optional details. They can materially change the economics of a Claude deployment.
The best production pattern is model routing plus caching plus workload-specific service choices. Without that, teams can easily pay premium-model prices for work that a cheaper model could have handled.
