By Chaminda Bandara (wgcban.com)

Setting up Distributed Data Parallel

Based on the following repos:

Video series: https://youtu.be/-K3bZYHYHEA?si=6oJA65LybhwDYotp

Github: https://github.com/pytorch/examples/tree/main/distributed/ddp-tutorial-series

PyTorch Tutorial: https://pytorch.org/tutorials/beginner/ddp_series_theory.html

Informative lecture: https://youtu.be/TibQO_xv1zc?si=3daCXI9m5bhSC5X1

Data Parallel

model = MyModel()
model = nn.DataParallel(model)

Identical copies of the model, different sub-batches, synchronized updates

Identical copies of the model, different sub-batches, synchronized updates

Untitled

Why DDP?

DataParallel DistributedDataParallel
More overhead; model is replicated and destroyed at each forward pass Model is replicated only once
Only support single node parallelism Supports calling to multiple machines
Slower; uses multithreading on a single process and runs into Global Interpreter Lock (GIL) contention Faster (no GIL connection) because it uses multiprocessing

Multi GPU Training with DDP

0. Introduction