Skip to content

Latest commit

 

History

History
96 lines (62 loc) · 4.08 KB

File metadata and controls

96 lines (62 loc) · 4.08 KB

Neural Network with heuristic vs Deep Q Network

Artificial Intelligence 7750 - Graduate final project Relevant papers

Demo video

image

Summary

Agent will compare the usage of Neural Network with heuristic vs Deep-Q-Network (DQN) learning to increasingly improve itself on playing a Snake game.

Environment

Actions

Represent snake's 3 possible actions using one-hot encoding, with 1 = action to do and 0 = action to not do.

[1,0,0] = forward (continues in current direction)

[0,1,0] = turn right

[0,0,1] = turn left

State

state = Represents 11 conditions using one-hot encoding, with 1 = condition met and 0 = condition unmet.

  • If danger (snake collides with its own body or game window boundary) is forward, right, and or left of the snake.
  • If current direction of snake is going left, right, up, or down.
  • If mice is left, right, up, and or down of snake (can have 2 combos if it's diagonal).
    [danger_forward, danger_right_turn, danger_left_turn,
     going_left, going_right, going_up, going_down,
     mice_left, mice_right, mice_up, mice_down]
    

Ex: state = [0, 0, 1, 0, 0, 0, 1, 0, 1, 1, 0] = Danger to left of snake, snake moving downward, and mice (food) is to right & up of snake.

Model

image

Neural Network with heuristic

Uses heuristic function to determine target action to take:

  1. decided_action = Direction(s) where there's no danger
  2. decided_action = If mice in same direction snake is heading towards, return "go forward" action
  3. decided_action = If mice in direction that snake can turn towards, return that direction
  4. decided_action = If no previous conditions matched/danger everywhere just return random action

image

DQN

Reward & Penalty

  • eat_mice = +10
  • game_over = -10
  • idle_steps_after_long_time = -10 (idle/useless steps limit porportional to length of snake*100)

Q learning

Uses Bellman equation to calculate new Q values image

Gamma & Epsilon

epsilon = 80-m Random exploration if randint(0, 200) < epsilon, else do exploitation

gamma = 0.9 Results fair better when gamma, aka discount factor, set closer to 1 (aka values future rewards almost as much as current rewards)

Results comparison

Both experiments ran for 10 minutes.

Conclusion As can be seen, whereas Neural Network with heuristic approach improves in performance quickly, as time passes the increase in performance plateaus. In DQN, the performance doesn't improve as quickly initially, but as time goes on there is a clear and continuous increase in performance and no signs of plateauing yet even after 10 minutes.

Neural Network with heuristic

image

image

DQN

image

image

References