This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
KernelBench RL is a reinforcement learning training framework for GPU kernel optimization. It fine-tunes LLMs (primarily Qwen-series) to write better CUDA/Triton kernels through multi-turn refinement using the Tinker distributed RL platform and KernelBench evaluation environment.
Key approach:
- Multi-turn RL with structured error feedback (Kevin-32B inspired)
- Retrieval-augmented prompting (RA-ICL) from 48K+ kernel corpus
- Thinking tokens (
<think>blocks) for reasoning before code generation - Progressive training stages (format → compile → correctness → speed)
Always use uv for Python operations. Never run Python directly. Use:
uv syncto install dependenciesuv run python ...to execute Python scriptsuv run pytest ...to run testsuv add <package>to add dependencies
# Install dependencies
uv sync
# Training (using justfile)
just train <experiment_name> # Default: Kevin + RA-ICL mode
just train-kevin <experiment_name> # Kevin mode explicitly
just train-raicl <experiment_name> # RA-ICL only (single-turn)
just train-config <config.yaml> <name> # Custom config
# Manual training
uv run python -m kernel_rl.scripts.train_kernel_rl \
--config kernel_rl/config/rl_kernelbench.yaml \
log_path=./runs/<experiment_name>
# Monitoring
just watch-metrics <experiment_name> # Live metric updates
just logs <experiment_name> # Tail training logs
just tensorboard <experiment_name> # Launch TensorBoard
just status # Check running jobs
# Run management
just resume <experiment_name> # Resume from checkpoint
just stop <experiment_name> # Stop training
just list # List all runs
# RAG index building
just build-rag-index # Build full index
just build-rag-index-triton # Triton kernels only
just build-rag-index-cuda # CUDA kernels only
# Evaluation
uv run python -m kernel_rl.scripts.eval_kernel_rl \
checkpoint_path=./runs/<experiment_name> \
level=1 \
output_path=./eval_results.json
# Linting
uv run black kernel_rl/
uv run isort kernel_rl/
uv run mypy kernel_rl/kernel_rl/
├── envs/ # RL environments wrapping KernelBench
│ ├── kernelbench_client.py # KernelBench evaluation (compile, test, benchmark)
│ ├── kernelbench_env.py # Single-turn RL environment
│ └── multiturn_kernelbench_env.py # Kevin-mode multi-turn with error feedback
│
├── rag/ # Retrieval-Augmented In-Context Learning
│ ├── corpus.py # KernelCorpus loader (KernelBook + Sakana datasets)
│ ├── retriever.py # FAISS index + sentence-transformers embeddings
│ └── prompt_builder.py # RA-ICL prompt construction
│
├── training/ # Core training logic
│ ├── loop.py # GRPO-style training loop (main entry)
│ ├── reward.py # Reward components: format, compile, correctness, speed, thinking
│ ├── models.py # Model config & Tinker client initialization
│ └── tensorboard_logger.py # Training metrics visualization
│
├── scripts/ # CLI entry points
│ ├── train_kernel_rl.py # Training CLI (loads YAML + CLI overrides)
│ ├── eval_kernel_rl.py # Evaluation CLI
│ └── build_rag_index.py # RAG index builder
│
└── config/ # YAML configurations
├── rl_kernelbench.yaml # Default (Kevin + RA-ICL)
├── rl_kernelbench_kevin.yaml # Kevin-mode specific
└── rl_kernelbench_raicl.yaml # RA-ICL only
Models produce outputs in this format:
<think>
- Optimization strategy
- Key implementation details
</think>
<KERNEL>
```cuda
// kernel code
Each problem allows multiple refinement turns with error feedback:
- Turn 1: Problem + RA-ICL examples → Kernel_1 → Evaluation
- Turn 2: Problem + Previous thinking + Error feedback → Kernel_2 → ...
- Continue until success or max_turns reached
- Discounted returns: R_t = s_t + γs_{t+1} + γ²s_{t+2} + ...
Weights are configurable per training stage:
format_reward_weight: Valid<KERNEL>block extractioncompile_reward_weight: Successful compilationcorrectness_reward_weight: Test passage (supports partial credit)speed_reward_weight: Speedup over baseline (enablemeasure_performance)thinking_reward_weight: Using<think>blocks
Uses chz framework merging:
- YAML config file (base defaults)
- CLI arguments with dot notation override (e.g.,
dataset_builder.batch_size=4) - Environment variables (for secrets like
TINKER_API_KEY)
| Variable | Required | Purpose |
|---|---|---|
TINKER_API_KEY |
Yes | Tinker distributed training API key |
KERNELBENCH_ROOT |
No | Path to KernelBench repo (auto-detected) |
Training runs output to ./runs/<experiment_name>/:
logs.log: Training logsmetrics.jsonl: Per-batch metrics (JSON lines)checkpoints.jsonl: Checkpoint paths for resumetensorboard/: TensorBoard event files