Skip to content

Conversation

@Darktex
Copy link
Contributor

@Darktex Darktex commented Dec 5, 2025

This RFC defines how reward functions (rubrics) are structured, composed, configured, and integrated with training infrastructure in OpenEnv.

Key design decisions:

  • Minimal spec + strong SDK (PyTorch philosophy) - Spec stays flexible, SDK provides "gravity" toward best practices
  • Rubrics as callable classes with dataclass configs for type safety and Hydra compatibility
  • Scalar reward + metadata - observation.reward is the canonical value; metadata["reward_components"] exposes per-rubric scores for logging
  • Dynamic config via POST /config - Enables reward shaping schedules (decay weights over training)
  • External services via MCP - LLM judges, databases, and APIs accessed through unified MCP protocol
  • Clear ownership model - Environment owns rubric code/schema; training infra owns config values

SDK helpers (optional but recommended):

  • RubricComposer - Within-environment rubric composition with normalization
  • RewardNormalizer - Cross-environment reward normalization for batched training
  • RewardLogger - W&B/TensorBoard integration for per-rubric curves

Test plan

  • Review RFC for completeness and clarity
  • Validate design against existing environments (coding_env, chat_env)
  • Gather feedback from potential users (training framework authors)
  • Prototype SDK helpers to validate API design

Design specification for reward functions (rubrics) in OpenEnv:
- Minimal spec + strong SDK philosophy (PyTorch-style)
- Rubric interface: callable classes with dataclass configs
- Observation format: scalar reward + metadata with components
- Dynamic config updates via POST /config for reward shaping schedules
- External services (LLM judges) via MCP
- SDK helpers: RubricComposer, RewardNormalizer, RewardLogger
- Multi-turn scenario support (Tau-bench style)
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 5, 2025
@Darktex Darktex marked this pull request as draft December 5, 2025 21:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants