Can We Trust AI Benchmarks? An Interdisciplinary Review of Current Issues in AI Evaluation
Authors
Maria Eriksson, Erasmo Purificato, Arman Noroozian, Joao Vinagre, Guillaume Chaslot, Emilia Gomez, David Fernandez-Llorca
Venue
arXiv preprint arXiv:2502.06559