LlamaFactory LoRA SFT on Qwen3.5-397B-A17B

Four training runs fine-tuning Qwen3.5-397B-A17B on AMD MI355X GPUs using amdpilot kernel agent trajectories. v1 established the pipeline; v2 introduced 3-view data; v3 fixed the training recipe; v4 fixed a critical data pipeline bug (66% of data silently dropped) and introduced leak-free evaluation.


Quick Comparison: v1 through v4

Metric v1 v2 v3 v4 (latest)
Key change Baseline 3-view data Recipe fix Data pipeline fix
Effective train examples ~100 ~100 (66% dropped) ~100 (66% dropped) 270 (all 3 views working)
Train loss 0.163 0.085 0.059 0.199
Eval loss n/a n/a 0.044 (leaked) 0.055 (clean, held-out)
Eval integrity none none 100% leaked 0% leaked
LoRA rank / alpha 16 / 32 16 / 32 32 / 64 32 / 64
Epochs / Steps 3 / 18 3 / 12 10 / 130 10 / 200
Training time 57 min 1h 32m 5h 10m 7h 59m
wandb disabled v2 v3 v4
HuggingFace v1 v2 v3 v4

Critical Bug Found in v2/v3 Data Pipeline

We discovered that only 100 out of 296 training examples actually reached the model in v2 and v3. The other 196 were silently dropped by LlamaFactory's OpenAI converter due to broken role alternation:

This means the entire 3-view data strategy (the supposed key improvement in v2) was never actually used. v2 and v3 both trained on only full-view trajectories truncated at 32K, losing the final solution in 86/100 cases.

v4 Fix


Leak-Free Evaluation in v4