https://jinjieni.notion.site/Diffusion-Language-Models-are-Super-Data-Learners-239d8f03a866800ab196e49928c019ac#241d8f03a86680ff9408e006f62cc675
0.5B model,
0.6B model,
LLama3.2 1B model
1.5B model
7B model
epochs, acc.
Training step to success rate
LLama3.2 1B model results is following:
action head is mlp2, same with openvla oft

action head is bottleneck 4 blocks, same with vote paper
