Data Leverage References

← Back to browse

Training Compute-Optimal Large Language Models

2022 inproceedings hoffmann2022chinchilla Not yet verified
Authors
Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre
Venue
NeurIPS 2022
Abstract
Shows that current LLMs are significantly undertrained. For compute-optimal training, model size and training tokens should scale equally. Introduces Chinchilla (70B params, 1.4T tokens) which outperforms larger models like Gopher (280B) trained on less data.