Yao Fu, firstname.lastname@example.org
University of Edinburgh
with Hao Peng and Tushar Khot
work done at Allen Institute for AI
Thank Junxian He @SJTU, Pan Lu @UCLA, **Ruibo Liu** @Dartmouth for insightful initial discussions and suggestions.
Thank Raj Ammanabrolu @AI2, Peter Liu @Google Brain, Brendan Dolan-Gavitt @NYU****, Denny Zhou** @Google Brain, Aman Madaan @CMU for discussions and suggestions after release, which greatly improved the comprehensiveness.
Started writing on Thu Dec 08, 2022, Released on Dec 11, 2022, Last Edit May 16 2023
I am also working on a paper version of this article.
Other versions: [pdf] [Arxiv] [中文] [bib]
Discuss on twitter with the author
Recently, the field has been greatly impressed and inspired by OpenAI’s ChatGPT. It is undoubtedly clever, capable, and very fun to talk to. Its multi-faceted abilities are significantly beyond many NLP researchers’ and practitioners’ expectations based on the impression of (not-that-strong) original GPT-3. The natural question is how ChatGPT gets there, and where these fantastic abilities come from. In this post, we try to dissect the emergent abilities and trace them to their sources, hoping to give a comprehensive roadmap about how the GPT-3.5 model family, along with related large language models, evolved to their current forms.
We hope this post can promote the transparency of large language models and serve as the roadmap for the community’s ongoing efforts of reproducing GPT-3.5.
Table of Content