RL Coding Agent

Self-improving LLMs through Group Relative Policy Optimization (GRPO) and multi-language execution sandboxes.

This project implements an autonomous training loop that evolves Qwen2.5-Coder (or any compatible model) into a superior coding agent. It leverages GRPO—the same reinforcement learning logic behind DeepSeek-R1—to optimize model performance using real-world execution feedback across six programming languages, requiring zero human labels.

🚀 Highlights

GRPO Training: Efficient reinforcement learning without the overhead of a separate critic or value network.
Autonomous Problem Generation: Uses any OpenAI-compatible API (Ollama, vLLM, OpenRouter) to generate unique coding challenges on the fly.
Multi-Language Sandbox: Integrated execution for Python, Go, Node.js, C#, C++, and Rust with strict timeouts and dependency management.
Hardware Efficient: Optimized for consumer hardware; fits 7B models in 4-bit (NF4) on a single 24GB GPU (RTX 3090/4090).
Windows & Linux Ready: Cross-platform support for all language runtimes and execution environments.

🛠️ The Feedback Loop

Synthesize: A "Teacher" LLM generates a structured coding problem with unit tests.
Rollout: The "Student" model generates multiple completion candidates (the "Group").
Execute: Every candidate is compiled and run against the unit tests in a secure sandbox.
Reward: Candidates are scored based on formatting, compilation success, and pass rate.
Optimize: GRPO computes advantages within the group to update the student policy.

🚦 Getting Started

1. Prerequisites

Ensure you have the necessary language runtimes (Go, Node, .NET, G++, Rust) installed.

# On Linux
bash system_deps.sh

# On Windows
# Ensure 'go', 'node', 'dotnet', 'g++', and 'cargo' are in your PATH.

2. Setup

git clone https://github.com/Akicou/rl-coding-agent
cd rl-coding-agent
pip install -r requirements.txt
cp .env.example .env  # Configure your OAI_BASE_URL and API keys

3. Training

# Run a smoke test to verify model loading and generation
python scripts/smoke_test.py

# Start the infinite training loop
python scripts/train.py

⚙️ Configuration (`RLConfig`)

Key parameters for tuning the training loop:

Category	Parameter	Default	Description
Model	`model_name`	`Qwen/Qwen2.5-Coder-7B-Instruct`	Target policy & reference model
Generation	`group_size`	`4`	Completions per rollout group
	`max_new_tokens`	`65536`	Max generation length
RL	`kl_coef`	`0.04`	Regularization vs. reference policy
	`clip_eps`	`0.2`	PPO-style clipping epsilon
Reward	`w_pass`	`1.0`	Weight for test pass rate
	`w_compile`	`0.3`	Weight for compilation success
Loop	`batch_size`	`2`	Problems per micro-batch
	`grad_accum`	`4`	Gradients accumulated per step

🧪 Language Runtimes

Language	Engine	Sandbox Detail
Python	3.11+	Auto-installs missing packages via pip
Go	1.22+	Isolated `go.mod` environment
Node.js	20+	Dynamic `package.json` with npm support
C#	.NET 8	Ephemeral `.csproj` with NuGet resolution
C++	G++ 17	Direct compilation and execution
Rust	Stable	Full Cargo project isolation

🤝 Contributing

We welcome technical contributions. To add a new language runtime:

Subclass LanguageExecutor in rl_agent/languages/.
Implement extract_deps() and execute().
Register it in the LANGUAGE_REGISTRY.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
rl_agent		rl_agent
scripts		scripts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
system_deps.sh		system_deps.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL Coding Agent

🚀 Highlights

🛠️ The Feedback Loop

🚦 Getting Started

1. Prerequisites

2. Setup

3. Training

⚙️ Configuration (`RLConfig`)

🧪 Language Runtimes

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

RL Coding Agent

🚀 Highlights

🛠️ The Feedback Loop

🚦 Getting Started

1. Prerequisites

2. Setup

3. Training

⚙️ Configuration (RLConfig)

🧪 Language Runtimes

🤝 Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

⚙️ Configuration (`RLConfig`)

Packages