Model-based multi-task RL for Partially observable environments.

The Environment

The environment use for this project is the L-world. It's a set of 2 gridworlds with L-shaped 'walls' the run along the side as shown here:

The agent cannot enter the GREEN squares in each case. The start state is the top left cell and the end state is the bottom right cell. A simple policy independent of the fact that green squares change isn't really possible since the environment changes and the optimal path is entirely different in each case.

The agent uses the following architecture to optimally determine the task( environment ) and solve it ( the agent can only observe a 5x5 square of states around it )

After running a fresh( randomly initialized ) agent on about 200 episodes, we see that the environment model has learnt the environment to a reasonable extent and can use the model to predict what the current task is:

Agent - YELLOW

Observed Cell - RED

Hidden Cell - GREY

Imagined Wall - PURPLE

Observed Wall - GREEN

Model vs Model-free performance:

The Y-axis is the average number of steps required to reach the bottom-right square( Goal )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model-based multi-task RL for Partially observable environments.

The Environment

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Model-based multi-task RL for Partially observable environments.

The Environment