Skip to content

Conversation

@StephenHornish
Copy link
Contributor

Currently, PyTAG uses the TAG score mechanism directly as the reinforcement-learning reward signal. While this is sufficient for some environments, it limits the ability to define intermediate rewards or reward functions that are non-monotonic with respect to the final game score both of which are often necessary for long-horizon or phase-based games. To address this, I added to the default AbstractState class with a new method, getReward(). By default, this method simply returns the game score, ensuring full backward compatibility and preserving the behavior of all existing RL environments. With the introduction of getReward(), PyTAG now queries the state’s reward function rather than directly using the score. Developers may optionally override this method to define a custom reward structure that differs from the final scoring mechanism. In the case of Power Grid, this allows the score to remain consistent to the official game rules while enabling a separate reward signal (e.g., intermediate rewards during bureaucracy phases). As a result, reward design is cleanly decoupled from score computation, providing greater flexibility without breaking existing implementations.

@hopshackle hopshackle added the enhancement New feature or request label Jan 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants