Evaluation short description
AIME26 is the latest AIME-style math reasoning benchmark, commonly used to evaluate LLM performance on competition-level problems requiring multi-step reasoning and exact answers.
It is widely used in the community as a standard reference benchmark for mathematical reasoning, alongside earlier AIME versions.
Evaluation metadata
Hi LightEval team,
Does LightEval currently support evaluating models on AIME26?
If not, is there a recommended way to add it as a custom task, or any plan to support it officially?
Thanks!
Evaluation short description
AIME26 is the latest AIME-style math reasoning benchmark, commonly used to evaluate LLM performance on competition-level problems requiring multi-step reasoning and exact answers.
It is widely used in the community as a standard reference benchmark for mathematical reasoning, alongside earlier AIME versions.
Evaluation metadata
Hi LightEval team,
Does LightEval currently support evaluating models on AIME26?
If not, is there a recommended way to add it as a custom task, or any plan to support it officially?
Thanks!