Question on the convergence of DQN on Pong game

Hi 
Thank you for your tutorials on medium and example codes.

I am a new learner on reinforcement learning. 

I tried to run the code of [DRL_15_16_17_DQN_Pong](https://github.com/jorditorresBCN/Deep-Reinforcement-Learning-Explained/blob/master/DRL_15_16_17_DQN_Pong.ipynb) but I failed to make it converge when training it. 

I am trying to find the reason and I am wondering if the problem results from the reward mechanism of the game environment. When the game is on-going, for most of the time the reward is zero. Only when an episode ends does the game return a 1 or -1 reward. Therefore, for most of the time, the loss becomes the MSE between "current predicted Q value" and "discounted Q value under next state" without reward value involved. I thus deduce that all predicted Q values will eventually be equal after many training iterations, which therefore results in failure to converge.  

Am I correct?

I appreciate it if you could offer help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on the convergence of DQN on Pong game #1

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question on the convergence of DQN on Pong game #1

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions