Skip to content

Epic: Add Benchmark validated Tag #1453

@sephmard

Description

@sephmard

Use cases, pain points, and background

NeMo Gym benchmarks currently do not have a standardized way to indicate whether a benchmark has been validated for production evaluation usage. As benchmark coverage grows, users need a simple mechanism to distinguish production-ready benchmarks from experimental or in-progress integrations. Without this metadata, benchmark readiness is unclear across configs, documentation, and evaluation workflows.

Description:

Add a standardized validated tag/status for benchmarks.

The tag should indicate that a benchmark:

has passed internal validation checks
has acceptable runtime behavior
has established expected evaluation parity
is approved for production evaluation usage

The tag should be surfaced in benchmark metadata/configuration and exposed through benchmark discovery/documentation flows where applicable (Ref: #1434)

Design:

Expected work includes:

  • defining benchmark validated metadata/schema
  • adding support in benchmark configs/registry metadata
  • surfacing validation status in docs/discovery tooling
  • documenting usage expectations for the tag

Potential areas of change:

Out of scope:

  • implementing automated validation workflows
  • retroactively validating all benchmarks
  • changing benchmark execution or scoring logic

Acceptance Criteria:

  • Benchmark metadata supports a validated tag/status
  • Validation status definitions are documented
  • validated status is surfaced in benchmark discovery/docs where applicable
  • Benchmarks can be marked as production-validated using the new metadata

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions