LlamaFactory LoRA SFT on Qwen3.5-397B-A17B

Four training runs fine-tuning Qwen3.5-397B-A17B on AMD MI355X GPUs using amdpilot kernel agent trajectories. v1 established the pipeline; v2 introduced 3-view data; v3 fixed the training recipe; v4 fixed a critical data pipeline bug (66% of data silently dropped) and introduced leak-free evaluation.

Quick Comparison: v1 through v4

Metric	v1	v2	v3	v4 (latest)
Key change	Baseline	3-view data	Recipe fix	Data pipeline fix
Effective train examples	~100	~100 (66% dropped)	~100 (66% dropped)	270 (all 3 views working)
Train loss	0.163	0.085	0.059	0.199
Eval loss	n/a	n/a	0.044 (leaked)	0.055 (clean, held-out)
Eval integrity	none	none	100% leaked	0% leaked
LoRA rank / alpha	16 / 32	16 / 32	32 / 64	32 / 64
Epochs / Steps	3 / 18	3 / 12	10 / 130	10 / 200
Training time	57 min	1h 32m	5h 10m	7h 59m
wandb	disabled	v2	v3	v4
HuggingFace	v1	v2	v3	v4

Critical Bug Found in v2/v3 Data Pipeline

We discovered that only 100 out of 296 training examples actually reached the model in v2 and v3. The other 196 were silently dropped by LlamaFactory's OpenAI converter due to broken role alternation:

Bookend view: Injected a user-role separator message between prefix and suffix. Two consecutive user messages violate the converter's strict alternation rule. 0/92 survived.
Solution chunk view: Injected a user-role recap message. Same issue. 0/102 survived.
Full view: No injection, proper alternation. 100/102 survived.

This means the entire 3-view data strategy (the supposed key improvement in v2) was never actually used. v2 and v3 both trained on only full-view trajectories truncated at 32K, losing the final solution in 86/100 cases.

v4 Fix

Removed all injected separator/recap messages
Direct concatenation of prefix + suffix (bookend) and prefix + chunk (solution)
Added ensure_valid_alternation() to trim trailing tool messages for even count
Added passes_converter_validation() post-filter to catch remaining edge cases
Result: 270 valid training examples (up from 100), with all 3 views working

LlamaFactory LoRA SFT on Qwen3.5-397B-A17B

Quick Comparison: v1 through v4

Critical Bug Found in v2/v3 Data Pipeline

v4 Fix

Leak-Free Evaluation in v4