🤖 AI Research Digest – 2026-04-13

LLM

Small Vision-Language Models are Smart Compressors for Long Video Understanding

📄 Summary: This paper introduces Tempo, a framework that uses small vision-language models to intelligently compress long videos for processing by larger multimodal models, solving the token budget problem that prevents hour-long video understanding. Rather than using blind sampling strategies, Tempo performs cross-modal distillation to keep only the most relevant frames while respecting strict token limits and maintaining causality.

💡 Key Insight: Small models can act as intelligent filters that compress videos while preserving what matters, rather than throwing away frames randomly.

🔗 Read Paper

Automating Database-Native Function Code Synthesis with LLMs

📄 Summary: DBCooker is an LLM-based system that automatically generates code for database-native functions, addressing the limitation that generic LLMs hallucinate or miss critical context specific to database development. The system handles the complex requirements of database function synthesis including function registration, internal linking, and logic implementation.

💡 Key Insight: Generic AI coding assistants fail at specialized domains like databases because they lack domain-specific constraints and context.

🔗 Read Paper

Towards Real-world Human Behavior Simulation: Benchmarking Large Language Models on Long-horizon, Cross-scenario, Heterogeneous Behavior Traces

📄 Summary: OmniBehavior is the first user simulation benchmark built entirely from real-world data, capturing long-term, cross-scenario behavioral patterns that isolated scenario benchmarks miss. The research reveals that real human decision-making depends on understanding causal chains across different contexts, not just single-scenario optimization.

💡 Key Insight: LLMs need to understand how past events in one context influence decisions in completely different scenarios to simulate realistic human behavior.

🔗 Read Paper

ML

Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

📄 Summary: This paper challenges the assumption that supervised finetuning only memorizes while RL generalizes, showing that cross-domain generalization in reasoning tasks is conditional on optimization dynamics, data quality, and model capability. The authors identify a "dip-and-recovery" pattern where early-stopped models appear to fail at generalization but actually succeed with longer training.