RL custom reward strucutre #348

StephenHornish · 2025-12-23T06:01:08Z

Currently, PyTAG uses the TAG score mechanism directly as the reinforcement-learning reward signal. While this is sufficient for some environments, it limits the ability to define intermediate rewards or reward functions that are non-monotonic with respect to the final game score both of which are often necessary for long-horizon or phase-based games. To address this, I added to the default AbstractState class with a new method, getReward(). By default, this method simply returns the game score, ensuring full backward compatibility and preserving the behavior of all existing RL environments. With the introduction of getReward(), PyTAG now queries the state’s reward function rather than directly using the score. Developers may optionally override this method to define a custom reward structure that differs from the final scoring mechanism. In the case of Power Grid, this allows the score to remain consistent to the official game rules while enabling a separate reward signal (e.g., intermediate rewards during bureaucracy phases). As a result, reward design is cleanly decoupled from score computation, providing greater flexibility without breaking existing implementations.

StephenHornish added 2 commits December 21, 2025 18:10

Added Reward method to abstractGameClass and implemented it in PowerGrid

0ebfd55

Documentation on reward

5d874b5

hopshackle added the enhancement New feature or request label Jan 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

RL custom reward strucutre #348

RL custom reward strucutre #348

Uh oh!

StephenHornish commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RL custom reward strucutre #348

Are you sure you want to change the base?

RL custom reward strucutre #348

Uh oh!

Conversation

StephenHornish commented Dec 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants