Shared References

← Back to browse

The Pile: An 800GB Dataset of Diverse Text for Language Modeling

2021 article pile_paper
Authors
Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, Connor Leahy
Venue
CoRR
arXiv