Skip to content

Epic: Establish Self-Service Benchmark Onboarding Workflow for NeMo Gym #1452

@sephmard

Description

@sephmard

Use cases, pain points, and background

Adding a new benchmark to NeMo Gym currently requires significant internal knowledge around benchmark structure, verifier implementation, dataset formatting, runtime integration, and production validation. While benchmark onboarding documentation already exists: https://docs.nvidia.com/nemo/gym/main/environment-tutorials/adding-a-benchmark/. The current experience is not yet optimized for a fully self-service workflow for engineers and researchers. This results in a difficult self-service journey for users to add a benchmark. This epic standardizes and improves the benchmark onboarding workflow documentation to provide a clear, technical, end-to-end guide for contributors onboarding new benchmarks into NeMo Gym.

Description:

Update and expand the benchmark onboarding documentation to provide a concise, end-to-end self-service workflow for adding benchmarks to NeMo Gym.

The documentation should cover:

  • benchmark integration patterns
  • benchmark scaffolding/layout
  • dataset and verifier requirements
  • resource server integration
  • validation and parity expectations

The result should become the canonical onboarding reference for benchmark contributors.

The resulting documentation should function as the canonical onboarding reference for benchmark contributors.

Design:

Expected work includes:

  • restructuring onboarding docs into clear implementation phases
  • documenting canonical benchmark structure and config patterns
  • documenting native vs wrapped benchmark integration paths
  • clarifying verifier and reward profiling expectations
  • documenting runtime/CI integration expectations
  • adding benchmark templates/examples
  • adding troubleshooting guidance and onboarding checklists

Likely areas of change:

  • onboarding documentation
  • benchmark examples/templates
  • verifier/resource server examples
  • production rollout guidance

Out of scope:

This epic does not include:

  • implementing benchmark migrations
  • redesigning Gym architecture
  • changing benchmark semantics or scoring
  • automating benchmark onboarding

Acceptance Criteria:

  • Benchmark integration patterns are clearly documented
  • Dataset, verifier, and resource server expectations are documented
  • Validation and production-readiness requirements are documented
  • Example benchmark implementations/templates are referenced
  • Troubleshooting guidance and onboarding checklists are added
  • Documentation is sufficient for engineers/researchers to onboard benchmarks with minimal maintainer support

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions