Skip to content

netzema/Deep-Q-Learning-for-Continuous-Control

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Q-Learning for Continuous Control

This project implements a Deep Q-Network (DQN) agent for a continuous control environment with discrete action space. The goal was to learn a near-optimal policy directly from pixel-level observations while ensuring stability and convergence within a limited training budget.

Objective

  • Train a reinforcement learning agent to maximize long-term rewards.
  • Use DQN components to stabilize training and balance exploration and exploitation.
  • Evaluate performance across internal validation and official challenge submissions.

Methodology

  • Observations: Pixel-level environment frames.

  • DQN components:

    • Replay buffer (100,000 samples).
    • Target network with soft updates.
    • ε-greedy exploration: ε annealed from 1.0 → 0.2 over 100,000 steps.
    • Delayed training start after 2,000 steps to ensure buffer diversity.
  • Hyperparameters:

    • Minibatch size: 256
    • Learning rate: 0.0001
    • Discount factor (γ): 0.99
  • Training: 1,000 episodes total.

  • Monitoring: Reward and loss plots used to track convergence.

Results

  • Agent performance improved rapidly in early episodes and stabilized at high reward levels (>0.9).

  • Loss curve followed expected DQN dynamics: initial instability → structured updates with variance.

  • Final evaluation:

    • Internal test set: average return 0.9632 over 50 episodes.
    • Challenge server: 0.964 score.

Takeaway The chosen hyperparameters and DQN design yielded a robust policy that converged quickly and matched the challenge benchmark.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors