Skip to content

[EVAL] Request support for AIME26 #1167

@JackLingjie

Description

@JackLingjie

Evaluation short description

AIME26 is the latest AIME-style math reasoning benchmark, commonly used to evaluate LLM performance on competition-level problems requiring multi-step reasoning and exact answers.

It is widely used in the community as a standard reference benchmark for mathematical reasoning, alongside earlier AIME versions.

Evaluation metadata


Hi LightEval team,

Does LightEval currently support evaluating models on AIME26?

If not, is there a recommended way to add it as a custom task, or any plan to support it officially?

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions