Reward shaping for training policies in CogsGuard; Stats logging for Cogsguard; fix bug with steps not logged in wandb post evaluation run #4941
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reward shaping presets & improved training/eval metrics
This PR adds configurable reward shaping presets for Cogsguard and improves logging and metrics reporting across training and evaluation.
Reward shaping strategies
Three presets are provided, ordered from least to most shaped:
Objective
Minimal shaping. Agents are rewarded only for holding aligned junctions:
collective_stats["aligned.junction.held"] = 1 / max_stepsMatches the current behavior.
Milestones
Objective reward plus small, tightly capped milestone rewards to guide exploration:
Uses cumulative
*.gainedstats to avoid penalizing spending or gear swaps.Credit
Most shaped. Combines objective + milestones with per-agent, success-only action credit:
Changes
Reward presets
RewardPresetoptions (objective,milestones,credit) for Cogsguard.recipes/experiment/cogsguard.py.Configurable progress metrics
Progress output now displays a configurable metric instead of hard-coded heart counts.
Metric key is pulled from
StatsReporterConfig.Updated the Cogsguard recipe to use
aligned.junction.held.avgas the progress metric.Used consistently in rich/plain logs via:
metta/rl/training/progress_logger.pymetta/rl/training/stats_reporter.pyLogging & checkpoints
by logging training metrics with
step=agent_step(
metta/rl/training/wandb_logger.py:57).metric/agent_step/metric/epochand log atstep=agent_step(
metta/rl/training/checkpointer.py:136).max(agent_step, wandb_run.step)) instead ofstep=epoch(
metta/sim/handle_results.py:179).(
metta/tools/eval.py:138).(
metta/rl/training/torch_profiler.py:130).Simulator stats
Simulator now exposes per-collective stats.
Tracks junction alignment and scramble actions per actor.
Updated bindings, type hints, and stats forwarding to
infosin:packages/mettagrid/cpp/bindings/mettagrid_py.cpppackages/mettagrid/cpp/include/mettagrid/handler/mutations/mutation.hpppackages/mettagrid/python/src/mettagrid/envs/stats_tracker.pypackages/mettagrid/python/src/mettagrid/mettagrid_c.pyipackages/mettagrid/python/src/mettagrid/types.pyHousekeeping
*.local.code-workspacein.gitignore.