3.1 Building models. Considerations include:
Choosing ML framework and model architecture Modeling techniques given interpretability requirements
3.2 Training models. Considerations include:
Using distributed training to organize reliable pipelines”
Data parallelism
Sync training
Async training
all-reduce-sync on TPUs
Every model is partitioned into parts, just as with data parallelism. Each model is then placed on an individual GPU. Model parallelism can be used to overcome the limitations associated with training a model on a single GPU (memory bottleneck) by splitting the model (layers) on multiple GPUs.
Cloud TPUs - tf.distribute.Strategy tensorflow API to distribute training across multiple GPUs, multiple machines or TPUs
