Tag: compute-optimal (1 references)
Scaling Laws for Neural Language Models
Establishes power-law scaling relationships between language model performance and model size, dataset size, and compute, spanning seven orders of magnitude.
Establishes power-law scaling relationships between language model performance and model size, dataset size, and compute, spanning seven orders of magnitude.