Dream: 项目规范 + 样本数据 → 可持续改进的系统并可以大规模部署
Reality:

Pytorch, JAX, Tensorflow, ONNX, Huggingface, Tim
DDP, Horovod
Sharded Data-Parallel
ZeRO-3 (Fully-sharded DataParallel)
Model-Parallel (Deepspeed, FairScale, Pytorch支持)
Tensor Parallel (Megatron LM)
trainer = Trainer(stategy="ddp_sharded")

加速方法: