For users of Claude.ai, Claude Desktop, and the Claude mobile app. You do not write code or call the API directly. Your levers are behavioural habits and account configuration.

How a chat message actually works

Claude has no memory between turns. Zero. Every time you send a message, the entire context is rebuilt from scratch and resent. The model reads it, generates a response, and forgets everything. The next message starts over.

What gets resent on every message:

Anthropic-side prompt caching reduces the compute cost of resending the static prefix, but it does not reduce context window occupancy. The tokens still sit in the window, still crowd out room for conversation, still apply to your two budgets in different ways.

The two budgets, for chat users specifically

For most chat users, quota is the binding constraint. Your goal is to fit more useful work inside the windows you have paid for. Context window matters mainly when you upload large files or refuse to start fresh conversations.

The habits that move the needle

These are ordered by impact. The first three are responsible for most of the gains available to a chat user.