Data Parallel

model = MyModel()
model = nn.DataParallel(model)

Identical copies of the model, different sub-batches, synchronized updates

loss values l_1 and l_2 are different. Hence, needs to synchronize the updates. Some communication happens.

Untitled

Synchronization is important, otherwise we will converge to two different solutions.
At each iteration of data parallel,
- Replicate model to all GPUs
- Split the batch to all GPUs
- Sync the updates (uses thread, not independent processes)

Why DDP?

DataParallel	DistributedDataParallel
More overhead; model is replicated and destroyed at each forward pass	Model is replicated only once
Only support single node parallelism	Supports calling to multiple machines
Slower; uses multithreading on a single process and runs into Global Interpreter Lock (GIL) contention	Faster (no GIL connection) because it uses multiprocessing