Reward shaping for training policies in CogsGuard; Stats logging for Cogsguard; fix bug with steps not logged in wandb post evaluation run #4941

subho406 · 2026-01-16T23:36:00Z

Reward shaping presets & improved training/eval metrics

This PR adds configurable reward shaping presets for Cogsguard and improves logging and metrics reporting across training and evaluation.

Reward shaping strategies

Three presets are provided, ordered from least to most shaped:

Objective
Minimal shaping. Agents are rewarded only for holding aligned junctions:
collective_stats["aligned.junction.held"] = 1 / max_steps
Matches the current behavior.
Milestones
Objective reward plus small, tightly capped milestone rewards to guide exploration:
- Hearts gained
- Aligner / scrambler gear gained
- Optional, capped collective deposits
  Uses cumulative *.gained stats to avoid penalizing spending or gear swaps.
Credit
Most shaped. Combines objective + milestones with per-agent, success-only action credit:
- Rewards agents directly for successful junction scramble and align actions
- Enables fast learning and clearer multi-agent credit assignment for the scramble → align sequence.

Changes

Reward presets
- Introduced RewardPreset options (objective, milestones, credit) for Cogsguard.
- Wired through environment, curriculum, training, and evaluation setup.
- Added progress metric defaults and zero-initialization for aligned junctions in
  recipes/experiment/cogsguard.py.
Configurable progress metrics
- Progress output now displays a configurable metric instead of hard-coded heart counts.
- Metric key is pulled from StatsReporterConfig.
- Updated the Cogsguard recipe to use aligned.junction.held.avg as the progress metric.
- Used consistently in rich/plain logs via:
  - metta/rl/training/progress_logger.py
  - New progress-metric accessors in metta/rl/training/stats_reporter.py
Logging & checkpoints
- Fix W&B “step not monotonically increasing” that could stall training logs after evals
  by logging training metrics with step=agent_step
  (metta/rl/training/wandb_logger.py:57).
- Checkpoint URI logs include metric/agent_step / metric/epoch and log at
  step=agent_step
  (metta/rl/training/checkpointer.py:136).
- Eval results use a monotonic step (max(agent_step, wandb_run.step)) instead of
  step=epoch
  (metta/sim/handle_results.py:179).
- Eval only opens the append-only W&B run when writing results (after rollouts)
  (metta/tools/eval.py:138).
- Profiler trace-link logs no longer implicitly advance the W&B step
  (metta/rl/training/torch_profiler.py:130).
Simulator stats
- Simulator now exposes per-collective stats.
- Tracks junction alignment and scramble actions per actor.
- Updated bindings, type hints, and stats forwarding to infos in:
  - packages/mettagrid/cpp/bindings/mettagrid_py.cpp
  - packages/mettagrid/cpp/include/mettagrid/handler/mutations/mutation.hpp
  - packages/mettagrid/python/src/mettagrid/envs/stats_tracker.py
  - packages/mettagrid/python/src/mettagrid/mettagrid_c.pyi
  - packages/mettagrid/python/src/mettagrid/types.py
Housekeeping
- Ignore local VS Code workspaces via *.local.code-workspace in .gitignore.

datadog-official · 2026-01-20T04:00:06Z

✅ Tests

🎉 All green!

❄️ No new flaky tests detected
🧪 All tests passed

_{This comment will be updated automatically if new data arrives.

🔗 Commit SHA: 66b7a12 | Docs | Was this helpful? Give us feedback!}

This comment has been minimized.

Sign in to view

subho406 force-pushed the cogsguard_shaped branch from 26d5082 to 660f8b6 Compare January 20, 2026 04:10

subho406 changed the title ~~Reward shaping for training policies in CogsGuard~~ Reward shaping for training policies in CogsGuard; Stats logging for Cogsguard; fix bug with steps not logged in wandb post evaluation run Jan 20, 2026

subho406 added 7 commits January 20, 2026 11:14

introduce reward shaping

29dc833

Log relevant stats for cogsguard

e659b63

Log aligned.junction.held via progress_logger for cogsguard

45a8cb3

Fix training not logged post evaluation run

c29a7a0

ignore local workspace configurations

cfc43d6

fix failing tests

664180b

logging fixes

66b7a12

subho406 force-pushed the cogsguard_shaped branch from b11ed0b to 66b7a12 Compare January 20, 2026 18:14

subho406 marked this pull request as ready for review January 20, 2026 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reward shaping for training policies in CogsGuard; Stats logging for Cogsguard; fix bug with steps not logged in wandb post evaluation run #4941

Reward shaping for training policies in CogsGuard; Stats logging for Cogsguard; fix bug with steps not logged in wandb post evaluation run #4941

Uh oh!

subho406 commented Jan 16, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

datadog-official bot commented Jan 20, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Reward shaping for training policies in CogsGuard; Stats logging for Cogsguard; fix bug with steps not logged in wandb post evaluation run #4941

Are you sure you want to change the base?

Reward shaping for training policies in CogsGuard; Stats logging for Cogsguard; fix bug with steps not logged in wandb post evaluation run #4941

Uh oh!

Conversation

subho406 commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reward shaping presets & improved training/eval metrics

Reward shaping strategies

Changes

Uh oh!

This comment has been minimized.

datadog-official bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

subho406 commented Jan 16, 2026 •

edited

Loading

datadog-official bot commented Jan 20, 2026 •

edited

Loading