Lookahead Optimizer: K Steps forward, 1 step back

Lookahead Optimizer: k steps forward, 1 step back

This repository contains code to run experiments which closely simulate those run by Geoffrey Hinton in the paper linked to above.

The results of the project can be interactively viewed at :

https://app.wandb.ai/akashpalrecha/lookahead/reports?view=akashpalrecha%2FReport 2019-11-14T10%3A17%3A00.225Z

In my analysis, I've found Lookahead to almost always give a higher training loss in comparison to SGD, AdamW, etc. The paper suggests that the training loss for Lookahead should generally be the lowest. For my experiments, the validation loss for Lookahead has been consistently the lowest, and the accuracy has been the highest. This shows that the new optimizer is probably very good at generalization.

The optimizer has been tested on CIFAR10, CIFAR100, and Imagenette (a smaller subset of the Imagenet dataset created by Jeremy Howard, the founder of Fast.ai)

The experiments have been run primarily using the FastAI library.

Lookahead Algorithm (Simplified):

Choose inner optimizer (SGD, AdamW, etc.)
Perform K iterations in the usual way
Linearly interpolate between weights of 1st and Kth iteration using a parameter alpha/
Repeat till terminating condition (convergence, epochs, etc.)

Lookahead Optimizer: K Steps forward, 1 step back

Lookahead Algorithm (Simplified):

Conclusions from the experiments: