🤖 AI Research Digest – 2026-04-11

LLM

Externalization in LLM Agents: A Unified Review of Memory, Skills, Protocols and Harness Engineering

📄 Summary: This paper reviews how modern LLM agents are built by externalizing capabilities into memory stores, reusable skills, and interaction protocols rather than modifying model weights. It argues that agent infrastructure matters because it transforms hard cognitive tasks into forms that models can solve more reliably, drawing on the concept of cognitive artifacts.

💡 Key Insight: The future of LLM agents lies in smart infrastructure around the model, not in the model itself.

🔗 Read Paper

Combee: Scaling Prompt Learning for Self-Improving Language Model Agents

📄 Summary: Combee enables prompt learning to scale across parallel agent executions by introducing a principled strategy that prevents quality degradation with high parallelism. This allows LLM agents to efficiently learn task-relevant system prompts from large collections of parallel agent traces without parameter changes.

💡 Key Insight: You can make agents learn and improve faster by letting many of them run simultaneously and learning together.

🔗 Read Paper

Small Vision-Language Models are Smart Compressors for Long Video Understanding

📄 Summary: Tempo is a framework that uses small vision-language models to intelligently compress long videos for MLLMs by identifying and preserving only the most relevant frames in a single forward pass. It solves the context-length bottleneck by treating compression as cross-modal distillation rather than blind downsampling.

💡 Key Insight: Smaller models can act as smart filters to help larger models understand long videos within token limits.

🔗 Read Paper

ML

SkillClaw: Let Skills Evolve Collectively with Agentic Evolver

📄 Summary: SkillClaw is a framework that enables multi-user LLM agent systems to collectively evolve their reusable skills over time by aggregating interactions and failure patterns across all users. Instead of skills remaining static, the system learns from heterogeneous user experiences to automatically improve skill definitions and reduce repeated failures.