π Summary: This paper reviews how modern LLM agents are built by externalizing capabilities into memory stores, reusable skills, interaction protocols, and infrastructure rather than modifying model weights directly. The work argues that this shift transforms difficult cognitive tasks into forms that models can solve more reliably, using cognitive artifacts as a unifying framework.
π‘ Key Insight: The smartest AI agents aren't built by training better models, but by building better external infrastructure around them.
π Read Paper
π Summary: Combee addresses the scalability challenge of prompt learning methods for LLM agents by enabling parallel execution across multiple agent traces without quality degradation. The method efficiently learns system prompts at scale by handling the synchronization challenges that arise when learning from many concurrent agentic executions.
π‘ Key Insight: Prompt learning for agents scales best when you parallelize learning across many agent runs rather than sequential single-agent improvement.
π Read Paper
π Summary: Tempo proposes using small vision-language models as efficient temporal compressors to adapt multimodal LLMs for hour-long videos while respecting token limits. The approach performs intent-aligned compression in a single forward pass and uses adaptive token allocation to strictly enforce budgets without breaking temporal causality.
π‘ Key Insight: Small AI models can act as intelligent filters, keeping only the video moments that matter for understanding rather than blindly sampling frames.
π Read Paper
π Summary: This paper challenges the narrative that supervised fine-tuning only memorizes while reinforcement learning generalizes, showing that reasoning SFT can generalize cross-domain when optimized properly. The work reveals a "dip-and-recovery" pattern where performance temporarily degrades before improving, and demonstrates that data quality and model capability jointly determine generalization success.
π‘ Key Insight: SFT failure often comes from stopping training too earlyβthe performance looks bad mid-training but recovers stronger later.
π Read Paper