From Amnesia to Mastery: How Agents Learn Skills In-Context

Author: Jiawei Wang First Published at Feb. 9, 2026. Work done as an Intern at Seed.

<aside> 💡

TL;DR

The Problem: LLM agents suffer from "amnesia"—they solve tasks via trial-and-error but fail to distill reusable strategies from past successes or failures. Existing memory methods provide raw recall but lack structured skill acquisition, while fine-tuning is computationally expensive and prone to catastrophic forgetting.
The Insight: Not all experience is equal. We identify that Negative Constraints (knowing what not to do) are the critical component for generalizing to unseen tasks. From a Bayesian perspective, while positive examples only validate a specific path, negative constraints effectively prune large swathes of the hypothesis space that lead to failure. Furthermore, experience quality trumps quantity—filtering out low-quality trajectories reduces "cognitive overhead".
The Solution: We introduce Experience-Driven Learning (EDL), a framework that mines structured skills (Atomic Tools, Workflows, Meta-Strategies, Guardrails) from trajectories using Contrastive Failure Analysis. This allows agents to acquire and refine skills in-context without updating model parameters.
The Impact:
- Performance: Boosts success rates on SpreadsheetBench Verified while reducing execution steps.
- Generalization: Enables agents to solve unseen, difficult tasks (where baseline fails) by applying transferred "guardrails" rather than rote memorization.
- Efficiency: Acts as "cognitive offloading," significantly stabilizing performance on erratic tasks and bypassing redundant self-correction steps. </aside>

@online{wang2025experience,
  title = {From Amnesia to Mastery: How Agents Learn Skills In-Context},
  author = {Wang, Jiawei},
  year = {2026},
  month = feb,
  url = {<https://www.notion.so/From-Amnesia-to-Mastery-How-Agents-Learn-Skills-In-Context-2ef968f937bc8068945fca6c69e659cf?source=copy_link>}
}

The Problem: Smart Models, "Amnesic" Agents

LLM agents are increasingly deployed in complex, multi-step environments such as spreadsheets, coding sandboxes, and web automation. Despite impressive zero-shot capabilities, these agents repeatedly suffer from a key limitation: they do not truly learn from experience.

In practice, an agent may solve a task correctly, encounter a similar task later, and still repeat the same trial-and-error process. Existing mechanisms—long-context memory, retrieval-augmented generation (RAG), or prompt engineering—primarily enable recall, not learning. As a result, agent performance remains brittle, inefficient, and highly sensitive to stochastic variation.

Updating model parameters via continual learning is a natural alternative, but it introduces substantial challenges including catastrophic forgetting, delayed feedback loops, and high operational costs. This motivates an intermediate paradigm:

Can LLM agents acquire reusable skills and strategies in context, without modifying their weights?

This question has sparked a wave of recent research. Works such as Learning on the Job[1], ReasoningBank[2], Evo-Memory[3], and FLEX[4] have pioneered the idea of abstracting agent trajectories into memory. These studies demonstrate that agents can indeed improve by "memorizing" past successes. However, a gap remains between raw recall and structured skill acquisition. Existing approaches often treat experience as a flat collection of trajectories or generic reflections, which can be noisy or difficult to generalize to strictly different contexts. Furthermore, few studies quantify the efficiency cost of learning—does the agent actually become "smarter" and faster, or does it merely stumble upon the answer with more guidance?

We propose Experience-Driven Learning (EDL) to address this question. EDL distinguishes itself by organizing experience into a fine-grained taxonomy—from atomic tool usage to high-level negative constraints—and rigorously filtering for quality. Through extensive experiments on SpreadsheetBench Verified[5], we show that structured experience does not just improve success rates; it significantly reduces execution steps, proving that the agent is learning efficient strategies rather than just memorizing answers.

Experience-Driven Learning

Defining "Experience": A 4-Level Taxonomy

In EDL, we do not treat experience as a flat log of history. Instead, we structure it into a fine-grained taxonomy that captures different levels of abstraction, serving as a latent strategy representation. By abstracting away environmental noise—such as specific cell addresses—this taxonomy extracts the intrinsic manifold of the strategy, ensuring robust semantic alignment and knowledge transfer across diverse task distributions.

Through iterative experimentation, we identified four complementary types of experience:

Experience Type	Definition	Why it matters
🛠️ Atomic Tool	Fine-grained API usage patterns.	Handles syntax nuances (e.g., `openpyxl` params).
📋 Procedural Workflow	SOPs for common sub-tasks.	Prevents skipping steps in complex pipelines.
🧠 Meta Strategy	High-level reasoning principles.	Guides how to think and decompose problems.
🚧 Negative Constraint	Explicit warnings on what NOT to do.	Prunes dead-ends based on past failures.

This taxonomy allows experience to encode not only what to do, but also what not to do, and at what level of abstraction. Examples can be found at Appendix A.

<aside> 💡

Why this matters: This taxonomy allows our system to encode both positive guidance (what to do) and negative boundaries (what to avoid). As we show later in the experiments, the Negative Constraints are particularly critical for generalizing to unseen, difficult tasks.

</aside>

Experience Mining from Trajectories

As shown in Figure 1, experiences are mined from N sampled trajectories of a baseline agent for each task. Rather than treating these trajectories as flat logs, we employ a multi-stage pipeline to extract structured knowledge based on outcome quality: