Chapter 3: The Anthropic API

For developers calling Claude programmatically. You write the system prompt, you choose the model, you configure caching, you decide what enters context. The consumer interfaces make many decisions for you; the API exposes all of them.

What you actually pay for

Per million tokens, as of April 2026, on standard pricing:

Model	Input	Output	Released
Claude Haiku 4.5	$1.00	$5.00	Oct 2025
Claude Sonnet 4.6	$3.00	$15.00	Feb 2026
Claude Opus 4.6	$5.00	$25.00	Feb 2026
Claude Opus 4.7	$5.00	$25.00	Apr 2026

Pricing verified at Anthropic's official documentation and several independent sources on April 25, 2026. Note that Claude Opus 4.7 ships with a new tokenizer that can produce up to 35% more tokens for the same input text on certain content types; sticker price is unchanged from 4.6, but effective per-task cost can rise.

Three structural facts about this pricing:

Output is 5x input. Across every current model, output tokens cost five times input tokens. This is consistent and worth internalising.
Output becomes input on the next turn. Every output token gets resent as part of conversation history on subsequent turns, billed at the input rate. A 1,000-token verbose response is more expensive than it appears.
Context windows are 1M tokens at standard pricing on current Sonnet 4.6, Opus 4.6, and Opus 4.7. Haiku 4.5 is 200K. The long-context surcharge that earlier models charged above 200K no longer applies on these three models.

The two discount stacks

API users have two large multiplicative discounts available, and they stack:

Prompt caching. Cache reads cost 10% of the standard input price. Cache writes cost 1.25x the standard input price. Mark cache breakpoints with "cache_control": {"type": "ephemeral"} on static content (system prompt, tool definitions, document prefixes). On a 50,000-token system prompt sent across 20 requests, cached cost drops from approximately $3.00 to $0.47 on Sonnet 4.6.
Batches API. 50% discount on both input and output for asynchronous workloads, with results delivered within 24 hours. Same token counts, half the cost. No prompt restructuring required.

Stacked, a batch request with a cache hit costs as little as 5% of standard non-cached pricing. For document-processing pipelines, bulk classification, and offline analysis, batches plus caching is the single biggest cost lever available.

The two budgets, for API users specifically

Context window: same 200K or 1M ceilings as elsewhere, applied per request. Newer Claude models return a validation error if input plus output exceeds the limit, rather than silently truncating older content.
Quota: not a fixed quota, but functionally a budget: total tokens × price = your monthly bill. Tier-based rate limits apply (requests per minute, tokens per minute), auto-upgraded based on spending history.

The optimisations, ordered by impact

1. Right-size the model to the task

The single highest-leverage architectural decision. Routing simple tasks to Haiku and reserving Sonnet or Opus for tasks that genuinely require stronger reasoning can cut blended costs by 5x or more. A system that uses Opus for everything when Haiku would suffice 70% of the time is paying 5x more than necessary on the majority of its requests. Build a router. Classification, extraction, simple summarisation, routing: Haiku. General application work: Sonnet. Complex multi-step reasoning, agentic coding: Opus.