Large Model

# hidden_layer_dims = [5000, 5000, 5000, 5000, 5000, 5000, 5000]
# nx = 1000, ny = 1000
# loss_scale = 1.00003466337
# epochs = 2000

params = 160,036,000

Run Time By Operation

Max Memory Allocation

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/8339c129-cfd2-4200-8ced-784cc9f3b092/large-16-maxmemalloc.png

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/a0e2c34a-bba9-4523-9cf2-013203431ea0/large-32-maxmemalloc.png

Training Loss (per epoch)

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/965f5799-a2a4-4ab7-be5e-7b2cfa335158/large-16-epoch_tr_loss.png

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/03f8a97c-d888-44cb-a391-6d8c08e6a527/large-32-epoch_tr_loss.png

Loss Scaler

https://s3-us-west-2.amazonaws.com/secure.notion-static.com/9fdd5518-86ee-4d56-a8ca-b16fe6489f08/large-16-loss_scale.png


Medium Model

# hidden_layer_dims = [500, 500, 500, 500, 500, 500, 500]
# nx = 1000, ny = 1000
# loss_scale = 1.0003466337
# epochs = 2000

params = 2,504,500