exploring the limits of transfer learning with a unified text-to-text transformer

Why should we care about this paper?

Transfer learning is a promising approach in NLP and language understanding. The idea is that you pretrain models on huge amounts of data in order to learn general-purpose knowledge. This pretraining can be done in supervised v.s. unsupervised. They can use what's learned from the more general task to perform on specific downstream tasks. An example of this is GPT-2, pretrained on a lot of the internet, then it was evaluated on specific tasks that it was not specifically trained for.

This paper is a survey paper of transfer learning and introduces a suggested standardization for transfer learning for NLP.

Big ideas:

creating a way to make every language problem into a text-to-text (aka seq2seq) problem. Text-to-text, as the authors define it, is "taking text as input and producing new text as output". Pros of this is that there would be some kind of standard model, objective, training procedure etc for all NLP tasks.
Scaling up helps a lot but there are still other architectural improvements to be made
- thinking about how people just made larger and larger LSTMs and then transformers came along and beat all the benchmarks

Contributions:

way to approach transfer learning (Text-to-Text Transfer Transformer aka T5), dataset (Colossal Clean Crawled Corpus) , pre-trained models

side comments:

is it normal to add a technical contribution to a survey paper?

link to paper results:

https://paperswithcode.com/paper/exploring-the-limits-of-transfer-learning