Skip to content

Conversation

@subho406
Copy link
Contributor

@subho406 subho406 commented Jan 16, 2026

Reward shaping presets & improved training/eval metrics

This PR adds configurable reward shaping presets for Cogsguard and improves logging and metrics reporting across training and evaluation.

Reward shaping strategies

Three presets are provided, ordered from least to most shaped:

  • Objective
    Minimal shaping. Agents are rewarded only for holding aligned junctions:
    collective_stats["aligned.junction.held"] = 1 / max_steps
    Matches the current behavior.

  • Milestones
    Objective reward plus small, tightly capped milestone rewards to guide exploration:

    • Hearts gained
    • Aligner / scrambler gear gained
    • Optional, capped collective deposits
      Uses cumulative *.gained stats to avoid penalizing spending or gear swaps.
  • Credit
    Most shaped. Combines objective + milestones with per-agent, success-only action credit:

    • Rewards agents directly for successful junction scramble and align actions
    • Enables fast learning and clearer multi-agent credit assignment for the scramble → align sequence.

Changes

  • Reward presets

    • Introduced RewardPreset options (objective, milestones, credit) for Cogsguard.
    • Wired through environment, curriculum, training, and evaluation setup.
    • Added progress metric defaults and zero-initialization for aligned junctions in
      recipes/experiment/cogsguard.py.
  • Configurable progress metrics

    • Progress output now displays a configurable metric instead of hard-coded heart counts.

    • Metric key is pulled from StatsReporterConfig.

    • Updated the Cogsguard recipe to use aligned.junction.held.avg as the progress metric.

    • Used consistently in rich/plain logs via:

      • metta/rl/training/progress_logger.py
      • New progress-metric accessors in metta/rl/training/stats_reporter.py
  • Logging & checkpoints

    • Fix W&B “step not monotonically increasing” that could stall training logs after evals
      by logging training metrics with step=agent_step
      (metta/rl/training/wandb_logger.py:57).
    • Checkpoint URI logs include metric/agent_step / metric/epoch and log at
      step=agent_step
      (metta/rl/training/checkpointer.py:136).
    • Eval results use a monotonic step (max(agent_step, wandb_run.step)) instead of
      step=epoch
      (metta/sim/handle_results.py:179).
    • Eval only opens the append-only W&B run when writing results (after rollouts)
      (metta/tools/eval.py:138).
    • Profiler trace-link logs no longer implicitly advance the W&B step
      (metta/rl/training/torch_profiler.py:130).
  • Simulator stats

    • Simulator now exposes per-collective stats.

    • Tracks junction alignment and scramble actions per actor.

    • Updated bindings, type hints, and stats forwarding to infos in:

      • packages/mettagrid/cpp/bindings/mettagrid_py.cpp
      • packages/mettagrid/cpp/include/mettagrid/handler/mutations/mutation.hpp
      • packages/mettagrid/python/src/mettagrid/envs/stats_tracker.py
      • packages/mettagrid/python/src/mettagrid/mettagrid_c.pyi
      • packages/mettagrid/python/src/mettagrid/types.py
  • Housekeeping

    • Ignore local VS Code workspaces via *.local.code-workspace in .gitignore.

@blacksmith-sh

This comment has been minimized.

@datadog-official
Copy link

datadog-official bot commented Jan 20, 2026

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

This comment will be updated automatically if new data arrives.
🔗 Commit SHA: 66b7a12 | Docs | Was this helpful? Give us feedback!

@subho406 subho406 changed the title Reward shaping for training policies in CogsGuard Reward shaping for training policies in CogsGuard; Stats logging for Cogsguard; fix bug with steps not logged in wandb post evaluation run Jan 20, 2026
@subho406 subho406 marked this pull request as ready for review January 20, 2026 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants