PPO Race Track - Reinforcement Learning 🏎️🧠

This project is a professional reference repository demonstrating how to optimally train a Proximal Policy Optimization (PPO) algorithm using Stable-Baselines3 in a 2D racing track environment (RaceTrackEnv).

🚀 Highlights and Performance Results

Even after a short training process (e.g., 20,000 steps), our model successfully completes the track without crashing (Lap = 1).

Example Evaluation Result:

Episode 1/10 | reward=1708.89 | steps=280 | laps=1
Episode 2/10 | reward=5615.78 | steps=869| laps=3

(Note: The identical outputs in each lap are not a bug. They result from the PPO model's deterministic evaluation which applies optimal and consistent rules. Since the environment and the starting position of the car are always exactly the same, the model flawlessly repeats its best actions.)

🏗️ Project Architecture (Optimized for PPO)

Normalized Observations and Rewards (VecNormalize): Automatically normalizes sensor data and rewards to resolve imbalances and provide stable gradients for the AI during learning.
EvalCallback (Evaluation System): Tests the model periodically during training. The model achieving the highest score is automatically saved as a .zip in the models/best_model/ directory.
Early Stopping & Logical Reward Philosophy: The car receives dense rewards for moving forward. To prevent "suicidal" behavior (intentionally crashing rapidly to avoid cumulative penalties), penalties for reversing are capped.
Stable Environment: Complex track geometries have been replaced with a simple oval track, allowing the model to quickly grasp driving logic.
Visual Tracking (RenderDuringTrainingCallback): Watch the car's actions live (policy mode) on the screen to observe its improvement during training without interrupting it.

💻 Setup and Environment

First, install all the required libraries:

pip install -r requirements.txt

🏎️ Usage Guide

1. Start Training (Training Process)

The following command trains the model for 2,000,000 steps. It opens a window every 50,000 steps to show a live visual preview for 500 frames:

python src/train_sb3.py --timesteps 2000000 --render-training --render-freq 50000 --render-steps 500 --dynamics-mode normal

(Tip: If you are impatient and cancel the training midway using CTRL+C in the terminal, don't worry! EvalCallback always keeps the best-performing version safely saved as "best_model.zip".)

2. Evaluating the Trained Model

To load your best model and watch it navigate the track for 10 episodes, use the following commands:

Deterministic (Precise & Flawless) Driving: The model takes no risks and flawlessly executes the exact optimal actions like a robot.

python src/evaluate.py --model-path ./models/best_model/best_model.zip --episodes 10 --dynamics-mode normal

Stochastic (Dynamic / Probabilistic) Driving: If you want the model to try different lanes and make more human-like deviations/mistakes, simply add the --stochastic flag:

python src/evaluate.py --model-path ./models/best_model/best_model.zip --episodes 10 --dynamics-mode normal --stochastic

3. Manual Driving (To test the environment)

You can drive the car using arrow keys to test the environment dynamics yourself:

python src/test_env.py

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PPO Race Track - Reinforcement Learning 🏎️🧠

🚀 Highlights and Performance Results

🏗️ Project Architecture (Optimized for PPO)

💻 Setup and Environment

🏎️ Usage Guide

1. Start Training (Training Process)

2. Evaluating the Trained Model

3. Manual Driving (To test the environment)

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PPO Race Track - Reinforcement Learning 🏎️🧠

🚀 Highlights and Performance Results

🏗️ Project Architecture (Optimized for PPO)

💻 Setup and Environment

🏎️ Usage Guide

1. Start Training (Training Process)

2. Evaluating the Trained Model

3. Manual Driving (To test the environment)

About

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages