Hi
Thank you for your tutorials on medium and example codes.
I am a new learner on reinforcement learning.
I tried to run the code of DRL_15_16_17_DQN_Pong but I failed to make it converge when training it.
I am trying to find the reason and I am wondering if the problem results from the reward mechanism of the game environment. When the game is on-going, for most of the time the reward is zero. Only when an episode ends does the game return a 1 or -1 reward. Therefore, for most of the time, the loss becomes the MSE between "current predicted Q value" and "discounted Q value under next state" without reward value involved. I thus deduce that all predicted Q values will eventually be equal after many training iterations, which therefore results in failure to converge.
Am I correct?
I appreciate it if you could offer help!
Hi
Thank you for your tutorials on medium and example codes.
I am a new learner on reinforcement learning.
I tried to run the code of DRL_15_16_17_DQN_Pong but I failed to make it converge when training it.
I am trying to find the reason and I am wondering if the problem results from the reward mechanism of the game environment. When the game is on-going, for most of the time the reward is zero. Only when an episode ends does the game return a 1 or -1 reward. Therefore, for most of the time, the loss becomes the MSE between "current predicted Q value" and "discounted Q value under next state" without reward value involved. I thus deduce that all predicted Q values will eventually be equal after many training iterations, which therefore results in failure to converge.
Am I correct?
I appreciate it if you could offer help!