Skip to content

Latest commit

 

History

History
85 lines (67 loc) · 3.83 KB

File metadata and controls

85 lines (67 loc) · 3.83 KB

*Refactoring is undergoing.

BlendRL: A Framework for Merging Symbolic and Neural Policies (ICLR 2025)

Hikaru Shindo, Quentin Delfosse, Devendra Singh Dhami, Kristian Kersting

We propose a framework that jointly learns symbolic and neural policies for reinforcement learning.

Quickstart

Installation

Follow INSTALLATION.md to install dependencies.

Download the trained agents:

wget https://hessenbox.tu-darmstadt.de/dl/fiCNznPuWkALH8JaCJWHeeAV/models.zip
unzip models.zip
rm models.zip

Then you can run the play script:

python play_gui.py --env-name kangaroo --agent-path models/kangaroo_demo
python play_gui.py --env-name seaquest --agent-path models/seaquest_demo

Note that a checkpoint is required to run the play script.

You can run the training script:

python train_blenderl.py --env-name seaquest --joint-training --num-steps 128 --num-envs 5 --gamma 0.99
  • --joint-training: train neural and logic modules jointly
  • --num-steps: the number of steps for policy rollout
  • --num-envs: the number of environments to train agents
  • --gamma: the discount factor for future rewards

How to Use

The Logic

Inside in/envs/[env_name]/logic/[ruleset_name]/, you find the logic rules that are used as a starting point for training. You can change them or create new rule sets. The ruleset to use is specified with the hyperparam rules.

```

How to Set up New Environments

You add a new environment inside in/envs/[new_env_name]/. There, you need to define a NudgeEnv class that wraps the original environment in order to do

  • logic state extraction: translates raw env states into logic representations
  • valuation: Each relation (like closeby) has a corresponding valuation function which maps the (logic) game state to a probability that the relation is true. Each valuation function is defined as a simple Python function. The function's name must match the name of the corresponding relation.
  • action mapping: action-predicates predicted by the agent need to be mapped to the actual env actions

See the freeway env to see how it is done.