Chapter 1: Claude Chat

For users of Claude.ai, Claude Desktop, and the Claude mobile app. You do not write code or call the API directly. Your levers are behavioural habits and account configuration.

How a chat message actually works

Claude has no memory between turns. Zero. Every time you send a message, the entire context is rebuilt from scratch and resent. The model reads it, generates a response, and forgets everything. The next message starts over.

What gets resent on every message:

The Anthropic system prompt (product identity, safety rules, formatting rules, tool rules). You cannot trim this. Estimated at several thousand tokens.
The full list of tool definitions (web search, image search, weather, maps, recipes, file operations, etc.). You cannot trim this in the chat interface.
Your <userPreferences> block, in full, every message. You control this.
Your <userMemories> block, the auto-generated profile Claude builds from past conversations. You control this.
The <available_skills> listing. Each installed skill contributes a name and short description on every message. You control this.
The full schema of every connected MCP app (Google Calendar, Gmail, Slack, etc.). Each connector adds substantial schema weight. You control this.
The complete conversation history. Every prior user message, every prior assistant response, every prior tool result. In full. You control this by starting new conversations.
Any file you have uploaded earlier in the conversation. In full. On every subsequent turn. You control this by uploading less and starting new conversations.
Your current message.

Anthropic-side prompt caching reduces the compute cost of resending the static prefix, but it does not reduce context window occupancy. The tokens still sit in the window, still crowd out room for conversation, still apply to your two budgets in different ways.

The two budgets, for chat users specifically

Context window: per conversation. 200K on Haiku, up to 1M on Sonnet 4.6 and Opus models. Usually enough unless you upload large files or run very long sessions.
Quota: the 5-hour rolling window plus weekly cap on Pro and Max plans. Shared with Claude Code if you also use that. You feel this one as "I hit the limit again" messages and forced waiting periods.

For most chat users, quota is the binding constraint. Your goal is to fit more useful work inside the windows you have paid for. Context window matters mainly when you upload large files or refuse to start fresh conversations.

The habits that move the needle

These are ordered by impact. The first three are responsible for most of the gains available to a chat user.