Stable Baselines3

[Tool]

Presentation : Stable baseline is a python library offering implementations of reinforcement learning algorithms. In the case of Transcendence, we are interested in PPO Algorithm.

Setup

Baseline3 requires PyTorch, Gymnasium

PPO parameters

policy : usually MlpPolicy for feature vectors
env
learning rate
n_steps
batch_size
n_epochs
gamma : discount factor for future rewards (typically 0.99)
gae_lambda : factor for trade-off of bias vs variance
clip_range

Use cases

Initialize

env = gym.make("Pong-v4")
model = PPO(
    "MlpPolicy", 
    env, 
    verbose=1, 
    learning_rate=0.0003, 
    n_steps=2048
)
model.learn(total_timesteps=10000)
model.save("ppo_pong_agent")

Important

Ensure the observation space (input data) is normalized using VecNormalize to help the PPO algorithm converge faster.

Do's & Don'ts

✅ Do	❌ Don't
Normalize inputs: Scale coordinates and velocities between [-1, 1].	Too large updates: Avoid high learning rates that cause the policy to collapse.
Save Checkpoints: Periodically save models during long training sessions.	Hardcode Environment: Don't link logic directly to the agent; keep the Gym interface decoupled.
Use TensorBoard: Monitor reward curves and loss values in real-time.	Ignore Seed: Don't forget to set manual seeds for reproducible AI experiments.

Resources

Type	Ressource	Notes
📄	Official Docs	PPO implementation
📄	The 37 Implementation Details of Proximal Policy Optimization	Blog post
🎥	An introduction to Policy Gradient methods	Arxiv Insights

Legend: 📄 Doc, 📘 Book, 🎥 Video, 💻 GitHub, 📦 Package, 💡 Blog

Home

🏗️ Architecture

Gateway Service - API Gateway & JWT validation
Auth Service - Authentication & 2FA/TOTP
AI Service - AI opponent
API Documentation - OpenAPI/Swagger
DB Schema - Databases

🌐 Web Technologies

Backend

Fastify - Web framework
Prisma - ORM
WebSockets - Real-time communication
Restful API - API standards

Frontend

React - UI library
CSS - Styling
Tailwind - CSS framework
Accessibility - WCAG compliance

🔧 Core Technologies

TypeScript - Language
Zod - Schema validation
Nginx - Reverse proxy

🔐 Security

Logging and Error management - Observability
OAuth 2.0 - Authentication flows
Two-factor authentication - 2FA/TOTP

⛓️ Blockchain

Avalanche - Blockchain network
Hardhat - Development framework
Solidity - Smart contracts language
Open Zeppelin - Security standards

🛠️ Dev Tools & Quality

ESLint - Linting
Vitest - Testing
GitHub Actions - CI/CD
Husky, Commit lints and git hooks - Git hooks
ELK - Logging stack

📝 Page model

Stable Baselines3

[Tool]

Setup

PPO parameters

Use cases

Initialize

Do's & Don'ts

Resources

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Home

🏗️ Architecture

🌐 Web Technologies

Backend

Frontend

🔧 Core Technologies

🔐 Security

⛓️ Blockchain

🛠️ Dev Tools & Quality

Clone this wiki locally