For developers calling Claude programmatically. You write the system prompt, you choose the model, you configure caching, you decide what enters context. The consumer interfaces make many decisions for you; the API exposes all of them.

What you actually pay for

Per million tokens, as of April 2026, on standard pricing:

Model Input Output Released
Claude Haiku 4.5 $1.00 $5.00 Oct 2025
Claude Sonnet 4.6 $3.00 $15.00 Feb 2026
Claude Opus 4.6 $5.00 $25.00 Feb 2026
Claude Opus 4.7 $5.00 $25.00 Apr 2026

Pricing verified at Anthropic's official documentation and several independent sources on April 25, 2026. Note that Claude Opus 4.7 ships with a new tokenizer that can produce up to 35% more tokens for the same input text on certain content types; sticker price is unchanged from 4.6, but effective per-task cost can rise.

Three structural facts about this pricing:

  1. Output is 5x input. Across every current model, output tokens cost five times input tokens. This is consistent and worth internalising.
  2. Output becomes input on the next turn. Every output token gets resent as part of conversation history on subsequent turns, billed at the input rate. A 1,000-token verbose response is more expensive than it appears.
  3. Context windows are 1M tokens at standard pricing on current Sonnet 4.6, Opus 4.6, and Opus 4.7. Haiku 4.5 is 200K. The long-context surcharge that earlier models charged above 200K no longer applies on these three models.

The two discount stacks

API users have two large multiplicative discounts available, and they stack:

Stacked, a batch request with a cache hit costs as little as 5% of standard non-cached pricing. For document-processing pipelines, bulk classification, and offline analysis, batches plus caching is the single biggest cost lever available.

The two budgets, for API users specifically

The optimisations, ordered by impact

1. Right-size the model to the task

The single highest-leverage architectural decision. Routing simple tasks to Haiku and reserving Sonnet or Opus for tasks that genuinely require stronger reasoning can cut blended costs by 5x or more. A system that uses Opus for everything when Haiku would suffice 70% of the time is paying 5x more than necessary on the majority of its requests. Build a router. Classification, extraction, simple summarisation, routing: Haiku. General application work: Sonnet. Complex multi-step reasoning, agentic coding: Opus.