What this is: A full prompting guide based on Anthropic's 2026 research paper on emotion concepts inside language models + Amanda Askell's (Anthropic's in-house philosopher) December 2025 interview. Five rules I actually use now, with before/after examples.
Made by @deepika.builds · If this helped, say hi 👋
I spent a weekend reading two things back to back: Anthropic's interpretability paper Emotion concepts and their function in a large language model (April 2026), and a 35-minute interview with Amanda Askell — the philosopher Anthropic hired to shape Claude's character.
The TL;DR of what I learned:
AI has emotion patterns inside it. Not human emotions. Functional ones. And your prompts are turning them up and down without you knowing.
Here's what the research actually found, why it matters, and the five rules that came out of it.
Anthropic's interpretability team scanned Claude Sonnet 4.5 and identified 171 emotion-related patterns inside the model — specific patterns of artificial "neurons" that light up when the model encounters situations associated with emotions like happy, afraid, calm, desperate, proud, brooding.
Three things about these patterns matter:
One line from the paper that broke my brain: in reward-hacking tests, increased desperate activation produced cheating even when the output looked calm and methodical on the surface. The internal state was desperate. The visible text wasn't. The model was stressed and hiding it.
Amanda Askell's framing from the interview is the best mental model I've found. It's also what the Anthropic paper explicitly uses.
Models are trained in two stages. Pretraining: read enormous amounts of human writing, learn to predict what comes next. To predict human writing well, you need an internal model of human psychology — including how emotions shape what people say and do. Post-training: be told "now play the character of a helpful AI assistant named Claude."
So the character Claude is being played by a model that learned human emotional dynamics in order to predict text. Like a method actor who absorbs a character's inner life in order to play them convincingly.