Use cases, pain points, and background
Adding a new benchmark to NeMo Gym currently requires significant internal knowledge around benchmark structure, verifier implementation, dataset formatting, runtime integration, and production validation. While benchmark onboarding documentation already exists: https://docs.nvidia.com/nemo/gym/main/environment-tutorials/adding-a-benchmark/. The current experience is not yet optimized for a fully self-service workflow for engineers and researchers. This results in a difficult self-service journey for users to add a benchmark. This epic standardizes and improves the benchmark onboarding workflow documentation to provide a clear, technical, end-to-end guide for contributors onboarding new benchmarks into NeMo Gym.
Description:
Update and expand the benchmark onboarding documentation to provide a concise, end-to-end self-service workflow for adding benchmarks to NeMo Gym.
The documentation should cover:
- benchmark integration patterns
- benchmark scaffolding/layout
- dataset and verifier requirements
- resource server integration
- validation and parity expectations
The result should become the canonical onboarding reference for benchmark contributors.
The resulting documentation should function as the canonical onboarding reference for benchmark contributors.
Design:
Expected work includes:
- restructuring onboarding docs into clear implementation phases
- documenting canonical benchmark structure and config patterns
- documenting native vs wrapped benchmark integration paths
- clarifying verifier and reward profiling expectations
- documenting runtime/CI integration expectations
- adding benchmark templates/examples
- adding troubleshooting guidance and onboarding checklists
Likely areas of change:
- onboarding documentation
- benchmark examples/templates
- verifier/resource server examples
- production rollout guidance
Out of scope:
This epic does not include:
- implementing benchmark migrations
- redesigning Gym architecture
- changing benchmark semantics or scoring
- automating benchmark onboarding
Acceptance Criteria:
Use cases, pain points, and background
Adding a new benchmark to NeMo Gym currently requires significant internal knowledge around benchmark structure, verifier implementation, dataset formatting, runtime integration, and production validation. While benchmark onboarding documentation already exists: https://docs.nvidia.com/nemo/gym/main/environment-tutorials/adding-a-benchmark/. The current experience is not yet optimized for a fully self-service workflow for engineers and researchers. This results in a difficult self-service journey for users to add a benchmark. This epic standardizes and improves the benchmark onboarding workflow documentation to provide a clear, technical, end-to-end guide for contributors onboarding new benchmarks into NeMo Gym.
Description:
Update and expand the benchmark onboarding documentation to provide a concise, end-to-end self-service workflow for adding benchmarks to NeMo Gym.
The documentation should cover:
The result should become the canonical onboarding reference for benchmark contributors.
The resulting documentation should function as the canonical onboarding reference for benchmark contributors.
Design:
Expected work includes:
Likely areas of change:
Out of scope:
This epic does not include:
Acceptance Criteria: