Data Leverage References

← Back to browse

Tag: benchmark (10 references)

Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation 2025 article

Maria Eriksson, Erasmo Purificato, Arman Noroozian, Joao Vinagre, Guillaume Chaslot, Emilia Gomez, David Fernandez-Llorca

The Leaderboard Illusion 2025 article

Shivalika Singh, Yiyang Nan, Alex Wang, Daniel D'Souza, Sayash Kapoor, Ahmet Üstün, Sanmi Koyejo, Yuntian Deng, Shayne Longpre, Noah A. Smith, Beyza Ermis, Marzieh Fadaee, Sara Hooker

DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning 2022 article

Chengcheng Guo, Bo Zhao, Yanbing Bai

Comprehensive library and empirical study of coreset selection methods for deep learning, finding that random selection remains a strong baseline across many settings.

The Stack: A Permissively Licensed Source Code Dataset 2022 misc
Measuring Mathematical Problem Solving With the MATH Dataset 2021 article

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt

Exploring Research Interest in Stack Overflow -- A Systematic Mapping Study and Quality Evaluation 2020 article

Sarah Meldrum, Sherlock A. Licorish, Bastin Tony Roy Savarimuthu

Face Recognition Vendor Test (FRVT) Part 3: Demographic Effects 2019 techreport

Grother, Patrick, Ngan, Mei, Hanaoka, Kayee

GSM8K Hugging Face Dataset Card misc
Grade-School Math (GSM8K) Repository misc
Competition Math Dataset on Hugging Face misc