This is a place to gather access and references to all the artefacts created during the one year BigScience workshop. It contains access and informations on the pretrained models, checkpoints, datasets but also even (depending on the working groups) papers, code tools, etc.
Model: 13B English decoder model
Data: Prompting dataset
Paper: Multitask Prompted Training Enables Zero-Shot Task Generalization (2021)
Paper: Masader: Metadata Sourcing for Arabic Text and Speech Data Resources (2021)
Data: BigScience Data Catalogue
Paper: Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP (2020)
Paper / Tool: LMdiff: A Visual Diff Tool to Compare Language Models