Training Compute-Optimal Large Language Models
Authors
Venue
NeurIPS 2022
Abstract
Shows that current LLMs are significantly undertrained. For compute-optimal training, model size and training tokens should scale equally. Introduces Chinchilla (70B params, 1.4T tokens) which outperforms larger models like Gopher (280B) trained on less data.
Tags
Links