Skip to content

Question on the convergence of DQN on Pong game #1

@HoiM

Description

@HoiM

Hi
Thank you for your tutorials on medium and example codes.

I am a new learner on reinforcement learning.

I tried to run the code of DRL_15_16_17_DQN_Pong but I failed to make it converge when training it.

I am trying to find the reason and I am wondering if the problem results from the reward mechanism of the game environment. When the game is on-going, for most of the time the reward is zero. Only when an episode ends does the game return a 1 or -1 reward. Therefore, for most of the time, the loss becomes the MSE between "current predicted Q value" and "discounted Q value under next state" without reward value involved. I thus deduce that all predicted Q values will eventually be equal after many training iterations, which therefore results in failure to converge.

Am I correct?

I appreciate it if you could offer help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions