Claude token efficiency guide

A practical guide for chat users, Claude Code users, and API users.

How to use Claude economically, ecologically, and intelligently, without pretending the three audiences read the same document.

This guide is segmented by how you use Claude.

Click any chapter to navigate. Each chapter is self-contained.

Chapter 1: Claude Chat. You use Claude.ai, Claude Desktop, or the mobile app. You do not write code, you do not call the API. You want better answers, faster, and you want to hit the usage limits less often.
Chapter 2: Claude Code. You use Claude as an agentic coding assistant, in the Desktop panel or the CLI. You hit context exhaustion in the middle of long tasks. Your weekly quota burns down by Wednesday.
Chapter 3: The Anthropic API. You build on top of Claude. You write the system prompt, you choose the model, you decide what to cache. You see your monthly bill and you want it lower without losing quality.
Appendices contain the underlying mental model of how a Claude request is constructed and a glossary of terms.

The token limits problem

You probably came to this document for one of these reasons.

You hit the Claude usage limit, again, and you were halfway through a real task. The reset is in three hours. You are on a paid plan and you do not understand why this keeps happening.

You spent ninety minutes setting Claude Code up on a complex refactor and the session ran out of context just as it was getting somewhere. The agent is now confused about the file it edited thirty turns ago.

Your weekly Pro or Max quota is exhausted by Wednesday. Last week it lasted until Friday. Nothing about your work pattern has changed and Anthropic has not announced a quota cut.

Your monthly API bill came in four times what you expected. You can guess which workload caused it but you cannot prove it, and you cannot tell what to do about it.

Your conversation with Claude has slowed down, started forgetting things, or started giving worse answers than it did at the beginning. You suspect uploading that PDF earlier had something to do with it. You are right.

You suspect that, in money or in time or in both, you are paying for the same context to be re-read on every turn of every conversation. You are right about that too.

What is actually happening

These are not random failures. They are the predictable consequence of how Claude works mechanically.

Contents

The token limits problem

What is actually happening