Zhuofeng Li$^{1,\textbf{},\textbf{†}}$, Dongfu Jiang$^{2,\textbf{},\textbf{†}}$ Xueguang Ma$^{2,\textbf{†}}$, Haoxiang Zhang$^{3,\textbf{†}}$, Ping Nie$^{2,\textbf{†}}$, Yuyu Zhang$^{4}$, Kai Zou$^{5}$, Jianwen Xie$^{6}$

Yu Zhang$^{1,♣}$, Wenhu Chen$^{2,♣}$

$^{1}$Texas A&M University $^{2}$University of Waterloo $^{3}$UC San Diego $^{4}$Verdent AI $^{5}$NetMind AI $^{6}$Lambda

$\textbf{*}$: Project Leads; $^\textbf{†}$Core Contributors; $^{♣}$Corresponding Authors

February 2026


<aside> 🌟

TL;DR

👨‍💻 Github, 🤗 HF Models & Datasets, 🚀 Demo, 💡 Case Study, 🔎 Eval Logs

</aside>


image.png

1. Open Source Gaps in Deep Research Agents

Since the release of DeepSeek-R1, the community has shown increasing interest in collecting long-reasoning trajectories from large reasoning models (LRMs) across diverse domains, including OpenThoughts [1], OpenMathReasoning [2], and OpenCodeReasoning [3]. These trajectories are typically used to post-train small- to mid-scale reasoning models via supervised fine-tuning (SFT). For instance, DeepSeek-R1-Distill models rely exclusively on SFT over a large, carefully curated dataset of long reasoning trajectories, achieving state-of-the-art performance [4].

More recently, with the rise of agentic reasoning, deep research agents [5,6]—systems capable of iterative search, evidence aggregation, and multi-step reasoning—have emerged as a key frontier of LLM capabilities. Correspondingly, there is growing emphasis on trajectories that involve tool use, particularly search, which are central to agentic reasoning settings.

In practice, search operations typically depend on proprietary search engine APIs (e.g., Serper). These APIs are costly at scale and inherently non-reproducible due to their black-box implementations, frequently suffering from high latency, outright failures, and inconsistent returns. As a result, they pose a major obstacle to the scalable collection of high-quality, long-horizon research trajectories.

Orthogonally, from the perspective of the open-source ecosystem, the availability of such trajectories remains limited. Meanwhile, existing open-source systems still exhibit substantial performance gaps relative to frontier closed-source models. We summarize the gap as follows:

Work Weights Trajectories Code Environment
Search-R1 [7] ✅ (Wikipedia)
Tongyi DeepReserach [5] ❌ (API-based)
MiroThinker [6] ❌ (API-based)
Ours

Taken together, these limitations highlight a fundamental bottleneck in the study of agentic reasoning: how can we synthesize high-quality, long-horizon agentic reasoning trajectories in a low-cost and fully reproducible manner?

To solve these, we introduce Open-Researcher, a fully offline, low-cost, and reproducible pipeline for synthesizing long-horizon (100+ turns) deep research trajectories that involve iterative search, evidence aggregation, and multi-step reasoning. All resources are released to the community.