I trained in an alien environment, and based on the source code, I only increased the num_episodes of the test link to 50. The curve returns during training are shown in the figure below. The result of this curve is very volatile. Although only 500epoch was trained, I feel that even if the training reaches 1000epoch, it cannot converge. Is this phenomenon normal? Can you provide the curve during your training process? Thank you very much.

I trained in an alien environment, and based on the source code, I only increased the num_episodes of the test link to 50. The curve returns during training are shown in the figure below. The result of this curve is very volatile. Although only 500epoch was trained, I feel that even if the training reaches 1000epoch, it cannot converge. Is this phenomenon normal? Can you provide the curve during your training process? Thank you very much.