CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

KernelBench RL is a reinforcement learning training framework for GPU kernel optimization. It fine-tunes LLMs (primarily Qwen-series) to write better CUDA/Triton kernels through multi-turn refinement using the Tinker distributed RL platform and KernelBench evaluation environment.

Key approach:

Multi-turn RL with structured error feedback (Kevin-32B inspired)
Retrieval-augmented prompting (RA-ICL) from 48K+ kernel corpus
Thinking tokens (<think> blocks) for reasoning before code generation
Progressive training stages (format → compile → correctness → speed)

Development Guidelines

Always use uv for Python operations. Never run Python directly. Use:

uv sync to install dependencies
uv run python ... to execute Python scripts
uv run pytest ... to run tests
uv add <package> to add dependencies

Common Commands

# Install dependencies
uv sync

# Training (using justfile)
just train <experiment_name>              # Default: Kevin + RA-ICL mode
just train-kevin <experiment_name>        # Kevin mode explicitly
just train-raicl <experiment_name>        # RA-ICL only (single-turn)
just train-config <config.yaml> <name>    # Custom config

# Manual training
uv run python -m kernel_rl.scripts.train_kernel_rl \
    --config kernel_rl/config/rl_kernelbench.yaml \
    log_path=./runs/<experiment_name>

# Monitoring
just watch-metrics <experiment_name>      # Live metric updates
just logs <experiment_name>               # Tail training logs
just tensorboard <experiment_name>        # Launch TensorBoard
just status                               # Check running jobs

# Run management
just resume <experiment_name>             # Resume from checkpoint
just stop <experiment_name>               # Stop training
just list                                 # List all runs

# RAG index building
just build-rag-index                      # Build full index
just build-rag-index-triton               # Triton kernels only
just build-rag-index-cuda                 # CUDA kernels only

# Evaluation
uv run python -m kernel_rl.scripts.eval_kernel_rl \
    checkpoint_path=./runs/<experiment_name> \
    level=1 \
    output_path=./eval_results.json

# Linting
uv run black kernel_rl/
uv run isort kernel_rl/
uv run mypy kernel_rl/

Architecture

kernel_rl/
├── envs/                    # RL environments wrapping KernelBench
│   ├── kernelbench_client.py    # KernelBench evaluation (compile, test, benchmark)
│   ├── kernelbench_env.py       # Single-turn RL environment
│   └── multiturn_kernelbench_env.py  # Kevin-mode multi-turn with error feedback
│
├── rag/                     # Retrieval-Augmented In-Context Learning
│   ├── corpus.py               # KernelCorpus loader (KernelBook + Sakana datasets)
│   ├── retriever.py            # FAISS index + sentence-transformers embeddings
│   └── prompt_builder.py       # RA-ICL prompt construction
│
├── training/                # Core training logic
│   ├── loop.py                 # GRPO-style training loop (main entry)
│   ├── reward.py               # Reward components: format, compile, correctness, speed, thinking
│   ├── models.py               # Model config & Tinker client initialization
│   └── tensorboard_logger.py   # Training metrics visualization
│
├── scripts/                 # CLI entry points
│   ├── train_kernel_rl.py      # Training CLI (loads YAML + CLI overrides)
│   ├── eval_kernel_rl.py       # Evaluation CLI
│   └── build_rag_index.py      # RAG index builder
│
└── config/                  # YAML configurations
    ├── rl_kernelbench.yaml         # Default (Kevin + RA-ICL)
    ├── rl_kernelbench_kevin.yaml   # Kevin-mode specific
    └── rl_kernelbench_raicl.yaml   # RA-ICL only

Key Patterns

Structured Output Format

Models produce outputs in this format:

<think>
- Optimization strategy
- Key implementation details
</think>

<KERNEL>
```cuda
// kernel code

```

Multi-Turn Flow (Kevin Mode)

Each problem allows multiple refinement turns with error feedback:

Turn 1: Problem + RA-ICL examples → Kernel_1 → Evaluation
Turn 2: Problem + Previous thinking + Error feedback → Kernel_2 → ...
Continue until success or max_turns reached
Discounted returns: R_t = s_t + γs_{t+1} + γ²s_{t+2} + ...

Reward Configuration

Weights are configurable per training stage:

format_reward_weight: Valid <KERNEL> block extraction
compile_reward_weight: Successful compilation
correctness_reward_weight: Test passage (supports partial credit)
speed_reward_weight: Speedup over baseline (enable measure_performance)
thinking_reward_weight: Using <think> blocks

Configuration System

Uses chz framework merging:

YAML config file (base defaults)
CLI arguments with dot notation override (e.g., dataset_builder.batch_size=4)
Environment variables (for secrets like TINKER_API_KEY)

Environment Variables

Variable	Required	Purpose
`TINKER_API_KEY`	Yes	Tinker distributed training API key
`KERNELBENCH_ROOT`	No	Path to KernelBench repo (auto-detected)

Output Structure

Training runs output to ./runs/<experiment_name>/:

logs.log: Training logs
metrics.jsonl: Per-batch metrics (JSON lines)
checkpoints.jsonl: Checkpoint paths for resume
tensorboard/: TensorBoard event files

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CLAUDE.md

Project Overview

Development Guidelines

Common Commands

Architecture

Key Patterns

Structured Output Format

Multi-Turn Flow (Kevin Mode)

Reward Configuration

Configuration System

Environment Variables

Output Structure

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

CLAUDE.md

Project Overview

Development Guidelines

Common Commands

Architecture

Key Patterns

Structured Output Format

Multi-Turn Flow (Kevin Mode)

Reward Configuration

Configuration System

Environment Variables

Output Structure