Skip to content

Latest commit

 

History

History
32 lines (18 loc) · 1.67 KB

File metadata and controls

32 lines (18 loc) · 1.67 KB

Model-based multi-task RL for Partially observable environments.

The Environment

The environment use for this project is the L-world. It's a set of 2 gridworlds with L-shaped 'walls' the run along the side as shown here:

alt text

The agent cannot enter the GREEN squares in each case. The start state is the top left cell and the end state is the bottom right cell. A simple policy independent of the fact that green squares change isn't really possible since the environment changes and the optimal path is entirely different in each case.

The agent uses the following architecture to optimally determine the task( environment ) and solve it ( the agent can only observe a 5x5 square of states around it ) alt text

After running a fresh( randomly initialized ) agent on about 200 episodes, we see that the environment model has learnt the environment to a reasonable extent and can use the model to predict what the current task is:

alt text

Agent - YELLOW

Observed Cell - RED

Hidden Cell - GREY

Imagined Wall - PURPLE

Observed Wall - GREEN

Model vs Model-free performance: alt text

The Y-axis is the average number of steps required to reach the bottom-right square( Goal )