On-Policy SFT Annotation: How Minimal Human Edits Unlock Massive Gains in LLM Agents

1. The Need for a Fast System in Terminal-Based SWE Data Collection

In terminal-based SFT data collection for software engineering (SWE) tasks, system responsiveness is a crucial determinant of both data efficiency and data quality. For annotators who are not proficient with command-line interfaces, two fundamental issues—slow typing speed and difficulty recalling commands—create significant friction in the labeling process.

First, the interaction cost of terminal input is inherently high. Unlike graphical interfaces that offer affordances such as buttons, menus, or auto-completion, terminals rely entirely on textual command entry. Each operation requires the annotator to recall and retype precise commands and parameters, often under the risk of syntax errors. When annotators frequently make syntax or command errors due to limited terminal proficiency, the repeated cycles of editing, rerunning, and verifying become time-consuming and mentally exhausting, causing substantial inefficiency and fatigue throughout the labeling process.

Second, human aversion to repetitive manual typing further amplifies the problem. Typing long or complex commands is cognitively taxing and error-prone, especially for users without strong muscle memory or command-line experience. Combined with slow system feedback, this friction discourages annotators from engaging deeply with each task. In practice, many tend to skip difficult samples or submit minimal edits just to complete the assignment, resulting in lower data diversity and weaker supervision signals for downstream model training.

Together, these factors raise the competence threshold for participation in terminal-based annotation. Only annotators with sufficient technical proficiency and patience can maintain throughput and consistency, which limits the scalability of data collection and increases training and management costs. Consequently, a fast and responsive system is not merely a convenience but a prerequisite for obtaining high-quality SFT data in SWE settings.

2. Edit over Write: An LLM-in-the-Loop Workflow for Terminal Based SWE Annotation

1) Problem and the Shift in Paradigm

In terminal based annotation for software engineering tasks, many annotators type slowly and do not remember command syntax well. Writing every command from scratch invites errors and repeated retries. Efficiency suffers because each correction requires new typing and renewed recall of flags and parameters.

We address this by shifting from a workflow where humans author the full annotation trajectory to a workflow where humans edit the model’s proposal. The model produces the next candidate command. The human makes a minimal correction only where the proposal is wrong. This keeps human effort focused on judgment rather than on transcription.

Example

Previously, to inspect the most recent 50 lines that contain Timeout, the annotator had to compose:

grep -n "Timeout" /var/log/app.log | tail -n 50

With LLM in the loop, the model might propose:

grep -n "Error" /var/log/app.log | tail -n 50

The annotator edits only one token: change "Error" to "Timeout". No full rewrite is needed.

https://youtu.be/izcwX2tMsnU