Training Compute-Optimal Large Language Models

2022 inproceedings hoffmann2022chinchilla ⚠ Needs review - 1 field differ

Fields with differences: venue. Compare local vs external BibTeX below.

Authors

Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, Tom Hennigan, Eric Noland, Katie Millican, George van den Driessche, Bogdan Damoc, Aurelia Guy, Simon Osindero, Karen Simonyan, Erich Elsen, Jack W. Rae, Oriol Vinyals, Laurent Sifre

Venue

NeurIPS 2022

Abstract

Shows that current LLMs are significantly undertrained. For compute-optimal training, model size and training tokens should scale equally. Introduces Chinchilla (70B params, 1.4T tokens) which outperforms larger models like Gopher (280B) trained on less data.

BibTeX

Local Entry

@inproceedings{hoffmann2022chinchilla,
  title = {Training Compute-Optimal Large Language Models},
  author = {Jordan Hoffmann and Sebastian Borgeaud and Arthur Mensch and Elena Buchatskaya and Trevor Cai and Eliza Rutherford and Diego de Las Casas and Lisa Anne Hendricks and Johannes Welbl and Aidan Clark and Tom Hennigan and Eric Noland and Katie Millican and George van den Driessche and Bogdan Damoc and Aurelia Guy and Simon Osindero and Karen Simonyan and Erich Elsen and Jack W. Rae and Oriol Vinyals and Laurent Sifre},
  year = {2022},
  booktitle = {NeurIPS 2022},
  url = {https://arxiv.org/abs/2203.15556},
  abstract = {Shows that current LLMs are significantly undertrained. For compute-optimal training, model size and training tokens should scale equally. Introduces Chinchilla (70B params, 1.4T tokens) which outperforms larger models like Gopher (280B) trained on less data.}
}

From AUTO:S2

@inproceedings{hoffmann2022chinchilla,
  title = {Training Compute-Optimal Large Language Models},
  author = {Jordan Hoffmann and Sebastian Borgeaud and A. Mensch and Elena Buchatskaya and Trevor Cai and Eliza Rutherford and Diego de Las Casas and Lisa Anne Hendricks and Johannes Welbl and Aidan Clark and Tom Hennigan and Eric Noland and Katie Millican and George van den Driessche and Bogdan Damoc and Aurelia Guy and Simon Osindero and K. Simonyan and Erich Elsen and Jack W. Rae and O. Vinyals and L. Sifre},
  year = {2022},
  booktitle = {arXiv.org}
}