What is the cheapest current Claude API model?

Based on Anthropic's current pricing page, Haiku 4.5 is the lowest-cost current Claude model at $1 per million input tokens and $5 per million output tokens.

Which Claude model is best for coding agents?

For complex agentic coding and enterprise work, Anthropic positions Opus 4.8 as the premium option. Sonnet 4.6 is usually the better cost-performance default when you need strong capability without Opus-level pricing.

How much does Claude prompt caching cost?

Prompt cache writes cost more than normal input tokens, while cache reads are cheaper. On the current pricing page, Sonnet 4.6 cache writes are $3.75 per million tokens and cache reads are $0.30 per million tokens.

Claude API Pricing 2026

Anthropic’s Claude pricing now has a clear ladder: Fable for long-running agents, Opus for complex agentic coding and enterprise work, Sonnet for balanced production workloads, and Haiku for fast low-cost execution.

The important point is that Claude cost is not just “input price plus output price.” Prompt caching, batch processing, web search, code execution, managed agent runtime, service tier, and data residency can all change the real bill.

Below is the practical breakdown for teams planning API usage in 2026.

Claude API Pricing Table

Current Claude API token pricing:

Model	Best fit	Input	Output	Cache write	Cache read
Fable 5	Long-running agents	$10 / MTok	$50 / MTok	$12.50 / MTok	$1 / MTok
Opus 4.8	Complex agentic coding and enterprise work	$5 / MTok	$25 / MTok	$6.25 / MTok	$0.50 / MTok
Sonnet 4.6	Balanced intelligence, cost, and speed	$3 / MTok	$15 / MTok	$3.75 / MTok	$0.30 / MTok
Haiku 4.5	Fastest, most cost-efficient model	$1 / MTok	$5 / MTok	$1.25 / MTok	$0.10 / MTok

MTok means one million tokens.

The price spread matters. Fable 5 output is 10x the cost of Haiku 4.5 output. Sonnet 4.6 output is 40% cheaper than Opus 4.8 output. For high-volume applications, model routing is not a minor optimization. It is the main cost control.

Model Selection: What Each Claude Tier Is For

Fable 5

Fable 5 is positioned for next-generation intelligence in long-running agents. That makes it relevant when the workload is not a single prompt-response exchange but a sustained process: planning, tool use, memory, file inspection, retries, and multi-step execution.

Use Fable only where the agent’s success rate justifies the premium. It is the most expensive current tier in the pricing table, especially for output-heavy workflows.

Opus 4.8

Opus 4.8 is the premium choice for complex agentic coding and enterprise work. It is cheaper than Fable 5 but still significantly more expensive than Sonnet and Haiku.

Use Opus for tasks where failure is expensive:

architecture review
large refactors
multi-repository code analysis
production incident investigation
complex tool-using agents
regulated enterprise workflows

The trap is using Opus for every request. If your application sends routine classification, extraction, summarization, or routing tasks to Opus, you are probably overpaying.

Sonnet 4.6

Sonnet 4.6 is the practical default for many production applications. It is positioned as the balance between intelligence, cost, and speed.

Use Sonnet for:

customer support assistants
internal copilots
coding assistants with moderate complexity
document analysis
workflow automation
agent loops where quality matters but every step does not need the top model

For many teams, the right architecture is Sonnet by default, Opus for escalation, and Haiku for simple stages.

Haiku 4.5

Haiku 4.5 is the cost-control model. It is the cheapest current Claude model in the table and should handle simple, high-volume tasks before they reach more expensive models.

Good Haiku workloads include:

intent detection
spam or abuse pre-filtering
title generation
extraction from short structured text
first-pass summarization
routing decisions
lightweight transformations

Haiku is also useful as a guardrail model in front of Sonnet or Opus. If it can confidently answer or route the request, you avoid paying premium-model prices.

Prompt Caching Changes the Economics

Prompt caching is one of the most important cost levers in Claude API deployments.

The current pricing structure charges a higher rate for writing tokens into the cache, then a much lower rate when those cached tokens are read again. For example:

Model	Normal input	Cache write	Cache read
Opus 4.8	$5 / MTok	$6.25 / MTok	$0.50 / MTok
Sonnet 4.6	$3 / MTok	$3.75 / MTok	$0.30 / MTok
Haiku 4.5	$1 / MTok	$1.25 / MTok	$0.10 / MTok

This is ideal when requests share a stable prefix:

system prompts
tool definitions
repository instructions
policy documents
product documentation
long few-shot examples
agent operating instructions

If every request repeats the same 40,000-token instruction block, prompt caching can cut a large part of the input bill. If every prompt is unique, caching helps much less.

Batch Processing Can Cut Cost by 50%

Anthropic advertises batch processing as a 50% saving option. The tradeoff is latency: batch is for asynchronous workloads that can wait.

Good candidates:

offline document summarization
enrichment jobs
nightly classification
eval runs
synthetic test generation
log analysis
dataset labeling

Bad candidates:

live chat
interactive coding sessions
incident response
real-time user workflows
agent loops waiting on immediate tool decisions

Batch processing is one of the easiest savings when your workload is not latency-sensitive.

Platform Feature Pricing

Claude Platform features add separate line items beyond token usage.

Feature	Current listed cost	Practical note
Managed Agents	$0.08 per active runtime session-hour	Runtime cost sits on top of standard token rates
Web search	$10 / 1K searches	Token costs for processing search results still apply
Code execution	50 free hours daily per organization, then $0.05 per container-hour	Useful for analysis, but track long-running containers

These costs are easy to miss in prototypes because the token bill is more visible. In production, feature pricing should be included in unit economics.

For example, an agent that runs for hours with many tool calls can accumulate both token usage and active runtime session-hours. A research assistant that performs frequent web search can add search costs before the model even processes the retrieved content.

US-Only Inference and Fast Mode

Two multipliers matter for enterprise cost planning:

US-only inference is listed at 1.1x pricing for input and output tokens.
Fast mode for Opus 4.8 is listed at 2x standard pricing.

US-only inference may be required for data residency or customer commitments. Fast mode can make sense when latency has direct business value, but it should not be the default for background tasks.

The rule is simple: attach multipliers to specific workloads, not entire environments.

A Practical Claude Cost Architecture

For most production teams, the cleanest design is a routing layer:

Workload	Suggested default
Simple classification, extraction, routing	Haiku 4.5
General assistant and document work	Sonnet 4.6
Complex coding, reasoning, and enterprise review	Opus 4.8
Long-running high-stakes agents	Fable 5
Offline bulk jobs	Batch processing
Repeated long prompts	Prompt caching

The goal is not always to use the cheapest model. The goal is to use the cheapest model that clears the quality bar for that stage of the workflow.

Example Monthly Cost Thinking

Assume a workload uses:

500 million input tokens
100 million output tokens
no prompt caching
no batch processing

At current listed rates:

Model	Input cost	Output cost	Total
Haiku 4.5	$500	$500	$1,000
Sonnet 4.6	$1,500	$1,500	$3,000
Opus 4.8	$2,500	$2,500	$5,000
Fable 5	$5,000	$5,000	$10,000

This simplified example shows why routing matters. The same token volume can be a $1,000 Haiku workload or a $10,000 Fable workload.

Real bills will differ because output ratios, cache hit rates, service tiers, feature usage, and regional settings all change the final number.

How To Reduce Claude API Spend

Start with the highest-leverage controls:

Route simple tasks to Haiku before using Sonnet, Opus, or Fable.
Use Sonnet as the default for balanced production workloads.
Reserve Opus and Fable for tasks where quality or autonomy justifies the premium.
Cache stable system prompts, tool schemas, and long context blocks.
Move offline jobs to batch processing.
Track web search, code execution, and managed agent runtime separately from token usage.
Measure cost per successful task, not only cost per token.
Keep fast mode and US-only inference scoped to workloads that require them.

Cost optimization should not make the system worse. The right metric is business outcome per dollar: resolved ticket, completed coding task, reviewed document, classified record, or successful agent run.

Key Takeaways

Claude pricing in 2026 is a tiered architecture problem.

Haiku 4.5 is the cheapest model and should handle simple, high-volume work. Sonnet 4.6 is the balanced default for many production applications. Opus 4.8 is the premium choice for complex coding and enterprise reasoning. Fable 5 is the expensive long-running agent tier.

Prompt caching and batch processing are not optional details. They can materially change the economics of a Claude deployment.

The best production pattern is model routing plus caching plus workload-specific service choices. Without that, teams can easily pay premium-model prices for work that a cheaper model could have handled.

Claude API Pricing 2026: Fable, Opus, Sonnet, and Haiku Compared

Claude API Pricing Table

Model Selection: What Each Claude Tier Is For

Fable 5

Opus 4.8

Sonnet 4.6

Haiku 4.5

Prompt Caching Changes the Economics

Batch Processing Can Cut Cost by 50%

Platform Feature Pricing

US-Only Inference and Fast Mode

A Practical Claude Cost Architecture

Example Monthly Cost Thinking

How To Reduce Claude API Spend

Key Takeaways

References

Frequently Asked Questions

Related Articles

Fix Claude Code Subscription Disabled and Copilot Credit Limit

OpenClaw Backup and Reinstall: Preserve Config, Memory, and Credentials

EU AI Act High-Risk Deadlines Postponed

Auth0 for AI Agents: Identity, Compliance, and Access Control