Dream: 项目规范 + 样本数据 → 可持续改进的系统并可以大规模部署

Reality:

image.png

Frameworks & Distributed Training

Pytorch, JAX, Tensorflow, ONNX, Huggingface, Tim

DDP, Horovod

Sharded Data-Parallel

ZeRO-3 (Fully-sharded DataParallel)

Model-Parallel (Deepspeed, FairScale, Pytorch支持)

Tensor Parallel (Megatron LM)

trainer = Trainer(stategy="ddp_sharded")

image.png

加速方法:

Compute