This model was created as a test of the Jean Zay infrastructure and an exploration of the potential difficulties and instabilities that could arise from scaling up a model.

Progress of this project was recorded here

13B English decoder only model

The final checkpoints can be found here: [to be added]

The code is here: [to be added]