Zhuofeng Li$^{1,\textbf{},\textbf{†}}$, Dongfu Jiang$^{2,\textbf{},\textbf{†}}$ Xueguang Ma$^{2,\textbf{†}}$, Haoxiang Zhang$^{3,\textbf{†}}$, Ping Nie$^{2,\textbf{†}}$, Yuyu Zhang$^{4}$, Kai Zou$^{5}$, Jianwen Xie$^{6}$

Yu Zhang$^{1,♣}$, Wenhu Chen$^{2,♣}$

$^{1}$Texas A&M University $^{2}$University of Waterloo $^{3}$UC San Diego $^{4}$Verdent AI $^{5}$NetMind AI $^{6}$Lambda

$\textbf{*}$: Project Leads; $^\textbf{†}$Core Contributors; $^{♣}$Corresponding Authors

February 2026

<aside> 🌟

TL;DR

Using GPT-OSS-120B + an offline corpus + a retriever, we can synthesize high-quality, long-horizon deep research trajectories with 100+ turns at zero cost for search and scraping APIs.
SFT on these deep research trajectories significantly boosts performance: training Nemotron-3-Nano-30B-A3B-Base on our synthesized data improves BrowseComp-Plus accuracy from 20.8% to 54.8%.
The offline corpus eliminates API dependency: API-based synthesis is expensive ($0.001-0.01/query), rate-limited, and non-deterministic. Our offline setup is low-cost, unlimited, and fully reproducible.
We release everything: search engine + corpus, deep research trajectories, trained models, and code.

👨‍💻 Github, 🤗 HF Models & Datasets, 🚀 Demo, 💡 Case Study, 🔎 Eval Logs

</aside>

1. Open Source Gaps in Deep Research Agents

Since the release of DeepSeek-R1, the community has shown increasing interest in collecting long-reasoning trajectories from large reasoning models (LRMs) across diverse domains, including OpenThoughts [1], OpenMathReasoning [2], and OpenCodeReasoning [3]. These trajectories are typically used to post-train small- to mid-scale reasoning models via supervised fine-tuning (SFT). For instance, DeepSeek-R1-Distill models rely exclusively on SFT over a large, carefully curated dataset of long reasoning trajectories, achieving state-of-the-art performance [4].

More recently, with the rise of agentic reasoning, deep research agents [5,6]—systems capable of iterative search, evidence aggregation, and multi-step reasoning—have emerged as a key frontier of LLM capabilities. Correspondingly, there is growing emphasis on trajectories that involve tool use, particularly search, which are central to agentic reasoning settings.

In practice, search operations typically depend on proprietary search engine APIs (e.g., Serper). These APIs are costly at scale and inherently non-reproducible due to their black-box implementations, frequently suffering from high latency, outright failures, and inconsistent returns. As a result, they pose a major obstacle to the scalable collection of high-quality, long-horizon research trajectories.

Orthogonally, from the perspective of the open-source ecosystem, the availability of such trajectories remains limited. Meanwhile, existing open-source systems still exhibit substantial performance gaps relative to frontier closed-source models. We summarize the gap as follows:

No long-horizon trajectories: Search-R1 uses Wikipedia with only 2-5 turns, falling far short of realistic deep research settings
No offline environment: Most approaches rely on live search APIs, making reproduction expensive and non-deterministic
Incomplete releases: No single work releases weights + trajectories + code + environment together

Work	Weights	Trajectories	Code	Environment
Search-R1 [7]	✅	❌	✅	✅ (Wikipedia)
Tongyi DeepReserach [5]	✅	❌	❌	❌ (API-based)
MiroThinker [6]	✅	✅	✅	❌ (API-based)
Ours	✅	✅	✅	✅

Taken together, these limitations highlight a fundamental bottleneck in the study of agentic reasoning: how can we synthesize high-quality, long-horizon agentic reasoning trajectories in a low-cost and fully reproducible manner?

To solve these, we introduce Open-Researcher, a fully offline, low-cost, and reproducible pipeline for synthesizing long-horizon (100+ turns) deep research trajectories that involve iterative search, evidence aggregation, and multi-step reasoning. All resources are released to the community.