Victor SanhAlbert WebsonColin RaffelStephen H. BachLintang SutawikaZaid AlyafeaiAntoine ChaffinArnaud StieglerTeven Le ScaoArun RajaManan DeyM Saiful BariCanwen XuUrmish ThakkerShanya Sharma SharmaEliza SzczechlaTaewoon KimGunjan ChhablaniNihal NayakDebajyoti DattaJonathan ChangMike Tian-Jian JiangHan WangMatteo ManicaSheng ShenZheng Xin YongHarshit PandeyRachel BawdenThomas WangTrishala NeerajJos RozenAbheesht SharmaAndrea SantilliThibault FevryJason Alan FriesRyan TeehanStella BidermanLeo GaoTali BersThomas WolfAlexander M. Rush


Working Group: Modeling: Prompt Engineering


Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020). It has been hypothesized that this is a consequence of implicit multitask learning in language models' pretraining (Radford et al., 2019). Can zero-shot generalization instead be directly induced by explicit multitask learning? To test this question at scale, we develop a system for easily mapping any natural language tasks into a human-readable, prompted form. We convert a large set of supervised datasets, each with multiple prompts with diverse wording. These prompted datasets allow for benchmarking the ability of a model to perform completely unseen tasks. We finetune a pretrained encoder-decoder model (Raffel et al., 2020; Lester et al., 2021) on this multitask mixture covering a wide variety of tasks. The model attains strong zero-shot performance on several standard datasets, often outperforming models up to 16x its size. Further, our approach attains strong performance on a subset of tasks from the BIG-bench benchmark, outperforming models up to 6x its size. All prompts and trained models are available at

this https URL bigscience-workshop/promptsource/ and this https URL.

Screenshot 2022-01-12 at 10.06.49.png