Scaling Laws for Neural Language Models
Authors
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, Dario Amodei
Venue
arXiv preprint
Abstract
Establishes power-law scaling relationships between language model performance and model size, dataset size, and compute, spanning seven orders of magnitude.
Tags