Tag: benchmark

Tag: benchmark (10 references)

Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation 2025 article

Maria Eriksson, Erasmo Purificato, Arman Noroozian, Joao Vinagre, Guillaume Chaslot, Emilia Gomez, David Fernandez-Llorca

ml-methods benchmark

View details Source arXiv preprint arXiv:2502.06559

The Leaderboard Illusion 2025 article

Shivalika Singh, Yiyang Nan, Alex Wang, Daniel D'Souza, Sayash Kapoor, Ahmet Üstün, Sanmi Koyejo, Yuntian Deng, Shayne Longpre, Noah A. Smith, Beyza Ermis, Marzieh Fadaee, Sara Hooker

ml-methods benchmark

View details Source arXiv preprint arXiv:2504.20879

DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning 2022 article

Chengcheng Guo, Bo Zhao, Yanbing Bai

Comprehensive library and empirical study of coreset selection methods for deep learning, finding that random selection remains a strong baseline across many settings.

benchmark data-selection language-models ml-methods

View details Source DEXA

The Stack: A Permissively Licensed Source Code Dataset 2022 misc

{BigCode Project}

ml-methods benchmark copyright data-governance data-infrastructure legal-policy

View details Source Dataset documentation

Measuring Mathematical Problem Solving With the MATH Dataset 2021 article

Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, Jacob Steinhardt

ml-methods language-models benchmark

View details Source

Exploring Research Interest in Stack Overflow -- A Systematic Mapping Study and Quality Evaluation 2020 article

Sarah Meldrum, Sherlock A. Licorish, Bastin Tony Roy Savarimuthu