-
Notifications
You must be signed in to change notification settings - Fork 4
Stable Baselines3
Francois edited this page Feb 7, 2026
·
1 revision
Presentation : Stable baseline is a python library offering implementations of reinforcement learning algorithms. In the case of Transcendence, we are interested in PPO Algorithm.
Baseline3 requires PyTorch, Gymnasium
- policy : usually
MlpPolicyfor feature vectors - env
- learning rate
- n_steps
- batch_size
- n_epochs
- gamma : discount factor for future rewards (typically 0.99)
- gae_lambda : factor for trade-off of bias vs variance
- clip_range
env = gym.make("Pong-v4")
model = PPO(
"MlpPolicy",
env,
verbose=1,
learning_rate=0.0003,
n_steps=2048
)
model.learn(total_timesteps=10000)
model.save("ppo_pong_agent")Important
Ensure the observation space (input data) is normalized using VecNormalize to help the PPO algorithm converge faster.
| โ Do | โ Don't |
|---|---|
| Normalize inputs: Scale coordinates and velocities between [-1, 1]. | Too large updates: Avoid high learning rates that cause the policy to collapse. |
| Save Checkpoints: Periodically save models during long training sessions. | Hardcode Environment: Don't link logic directly to the agent; keep the Gym interface decoupled. |
| Use TensorBoard: Monitor reward curves and loss values in real-time. | Ignore Seed: Don't forget to set manual seeds for reproducible AI experiments. |
| Type | Ressource | Notes |
|---|---|---|
| ๐ | Official Docs | PPO implementation |
| ๐ | The 37 Implementation Details of Proximal Policy Optimization | Blog post |
| ๐ฅ | An introduction to Policy Gradient methods | Arxiv Insights |
Legend: ๐ Doc, ๐ Book, ๐ฅ Video, ๐ป GitHub, ๐ฆ Package, ๐ก Blog
- Gateway Service - API Gateway & JWT validation
- Auth Service - Authentication & 2FA/TOTP
- AI Service - AI opponent
- API Documentation - OpenAPI/Swagger
- DB Schema - Databases
- Fastify - Web framework
- Prisma - ORM
- WebSockets - Real-time communication
- Restful API - API standards
- React - UI library
- CSS - Styling
- Tailwind - CSS framework
- Accessibility - WCAG compliance
- TypeScript - Language
- Zod - Schema validation
- Nginx - Reverse proxy
- Logging and Error management - Observability
- OAuth 2.0 - Authentication flows
- Two-factor authentication - 2FA/TOTP
- Avalanche - Blockchain network
- Hardhat - Development framework
- Solidity - Smart contracts language
- Open Zeppelin - Security standards
- ESLint - Linting
- Vitest - Testing
- GitHub Actions - CI/CD
- Husky, Commit lints and git hooks - Git hooks
- ELK - Logging stack
๐ Page model