The Grid Driving environment is a simple domain featuring simple yet scalable environment for testing various planning and reinforcement learning algorithm.
The Grid Driving task involves a "simplified" driving through traffic full of vehicles from one point to the other. At each timestep, the car can move either up, down, or stay on the same lane as it automatically moves forward. The accomplishment of the task would give +10 reward, whereas failure would yield 0 rewards. The sparse nature of the goal and its variability makes the environment suitable as an entry point for initial experimentation on scalability as well as long term planning.
Execute the following command to install the package
pip install -e .
Create the environment this way
import gym
import gym_grid_driving
env = gym.make('GridDriving-v0')
Example:
import numpy as np
state = env.reset()
for i in range(12):
env.render()
state, reward, done, info = env.step(np.random.choice(env.actions))
import gym
import gym_grid_driving
from gym_grid_driving.envs.grid_driving import LaneSpec, MaskSpec, Point
lanes = [
LaneSpec(2, [-1, -1]),
LaneSpec(2, [-2, -1]),
LaneSpec(3, [-3, -1]),
]
env = gym.make('GridDriving-v0', lanes=lanes, width=8,
agent_speed_range=(-3,-1), finish_position=Point(0,1), agent_pos_init=Point(6,1),
stochasticity=1.0, tensor_state=False, flicker_rate=0.5, mask=MaskSpec('follow', 2), random_seed=13)
actions = env.actions
env.render()
lanesaccepts a list ofLaneSpec(cars, speed_range)governing how each lanes should be, withcarsbeing integer andspeed_range=[min, max],minandmaxshould also be integerwidthspecifies the width of the simulator as expectedagent_speed_range=[min, max]is the agent speed range which affects the available actions- Coordinate of the finisih point
finish_position - Coordinate of the agent initial position
agent_pos_init env.actionsis an enum containing all available actions which would change depending on theagent_speed_rangeenv.action_spaceis the OpenAI gym action space that can be sampled, with the definitions defined inenv.actions- Degree of stochasticity
stochasticitywith1.0being fully-stochastic and0.0being fully-deterministic observation_typewhich can be either'state','tensor', or'vector'whether to output state as it is, as 3D tensor[channel, height, width]withchannel=[cars, agent, finish_position, occupancy_trails], or vector.random_seedto make the environment reproducibleflicker_ratespecifies how often the observation will not be available (blackout)maskdefines fog of war that acceptsMaskSpec(type, radius)with the type being'follow'or'random'. Type follow implies the area visibility to follow the agent whereas random will give random visibility (randomized at every step)
Notes:
- To make the simulator deterministic, one just have to set the
stochasticity=0.0ormin=maxin the carspeed_range - Parking scenario is just a special case where
min=max=0in the carspeed_range
========================================
F - - - - O - - - -
- - 2 1 - - - - - -
- - 4 3 - - - 5 - <
========================================
Start
========================================
F - - O - - - - - -
- - - - - 2 - - - 1
- - 3 5 - - - 4 - <
========================================
down
========================================
F - O - - - - - - -
- - 2 ~ ~ - 1 ~ ~ -
- 3 5 - - 4 ~ - < -
========================================
up
========================================
OF ~ - - - - - - - -
~ ~ - 1 ~ ~ - < - 2
3 5 - - 4 - - - - -
========================================
forward[-3]
========================================
F - - - - - - - - O
1 ~ ~ - < - 2 ~ ~ -
~ - 4 ~ - - - - 3 5
========================================
down
========================================
F - - - - - - O ~ -
- - - 2 ~ ~ - 1 ~ ~
~ ~ - < - - 3 5 ~ 4
========================================
down
========================================
F - - - - - O - - -
2 ~ ~ - 1 ~ ~ - - -
- - < 3 ~ 5 ~ - 4 -
========================================
up
========================================
F - - - O ~ - - - -
- 1# ~ ~ - - - 2 ~ ~
- - 3 5 ~ 4 ~ ~ - -
========================================
<: AgentF: Finish pointInteger: Car#: Crashed agent-: Empty road~: Occupancy trails