Word2Vec: Efficient Estimation of Word Representations in Vector Space

GloVe: Global Vectors for Word Representation

Sequence to Sequence Learning with Neural Networks

Neural Machine Translation by Jointly Learning to Align and Translate

Attention is All You Need

The ELMo Paper: Deep Contextualized Word Representations

Universal Language Model Fine-tuning for Text Classification

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Improving Language Understanding by Generative Pre-Training

Language Models are Unsupervised Multitask Learners

T5: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

GPT-3: Language Models are Few-Shot Learners

ChatGPT: Applications, Opportunities, and Threats

LLaMA: Open and Efficient Foundation Language Models