Skip to content

Commit 548ab1b

Browse files
committed
Docs update
1 parent 5207fbf commit 548ab1b

43 files changed

Lines changed: 1085 additions & 108 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

README.md

Lines changed: 65 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
**Run LLM-powered agents in a REPL loop, benchmark them, and compare results.**
44

5-
RLM Code implements the [Recursive Language Models](https://arxiv.org/abs/2502.07503) (RLM) paper by Zhang, Kraska & Khattab. Instead of stuffing your entire document into the LLM's context window, RLM stores it as a Python variable and lets the LLM write code to analyze it — chunk by chunk, iteration by iteration. This is dramatically more token-efficient for large inputs.
5+
RLM Code implements the [Recursive Language Models](https://arxiv.org/abs/2502.07503) (RLM) approach from the 2025 paper release. Instead of stuffing your entire document into the LLM's context window, RLM stores it as a Python variable and lets the LLM write code to analyze it — chunk by chunk, iteration by iteration. This is dramatically more token-efficient for large inputs.
66

77
RLM Code wraps this algorithm in an interactive terminal UI with built-in benchmarks, trajectory replay, and observability.
88

@@ -14,6 +14,13 @@ uv tool install "rlm-code[tui,llm-all]"
1414

1515
This installs `rlm-code` as a globally available command with its own isolated environment. You get the TUI and all LLM provider clients (OpenAI, Anthropic, Gemini).
1616

17+
Requirements:
18+
19+
- Python 3.11+
20+
- `uv` (recommended) or `pip`
21+
- one model route (BYOK API key or local server like Ollama)
22+
- one secure execution backend (Docker recommended; Monty optional)
23+
1724
Don't have uv? Install it first:
1825

1926
```bash
@@ -115,6 +122,27 @@ After at least two benchmark runs, export a compare report:
115122

116123
Walk through the last run one step at a time — see what code the LLM wrote, what output it got, and what it did next.
117124

125+
### 7. Use RLM Code as a coding agent (local/BYOK/ACP)
126+
127+
RLM Code can also be used as a coding-agent harness in the TUI:
128+
129+
```text
130+
/harness tools
131+
/harness run "fix failing tests and add regression test" steps=8 mcp=on
132+
```
133+
134+
ACP is supported too:
135+
136+
```text
137+
/connect acp
138+
/harness run "implement feature X with tests" steps=8 mcp=on
139+
```
140+
141+
Notes:
142+
143+
- In Local/BYOK connection modes, likely coding prompts in chat can auto-route to harness.
144+
- In ACP mode, auto-routing is intentionally off; use `/harness run ...` explicitly.
145+
118146
## How the RLM Loop Works
119147

120148
Traditional LLM usage: paste your document into the prompt, ask a question, hope the model doesn't lose details in the middle.
@@ -129,6 +157,21 @@ RLM approach:
129157

130158
This means the LLM can handle documents much larger than its context window, because it reads them in chunks through code rather than all at once through the prompt.
131159

160+
## What This Is (and Is Not)
161+
162+
RLM Code is:
163+
164+
- a research playground for recursive/model-assisted coding workflows
165+
- a benchmarking and replay tool for reproducible experiments
166+
167+
RLM Code is not:
168+
169+
- a no-config consumer chat app
170+
- guaranteed cheap (recursive runs can be expensive)
171+
- safe to run with unrestricted execution settings
172+
173+
Use secure backend defaults (`/sandbox profile secure`) for normal use.
174+
132175
## Key Commands
133176

134177
| Command | What it does |
@@ -141,11 +184,32 @@ This means the LLM can handle documents much larger than its context window, bec
141184
| `/rlm bench preset=<name>` | Run a benchmark preset |
142185
| `/rlm bench list` | List available benchmarks |
143186
| `/rlm bench compare` | Compare latest benchmark run with previous run |
187+
| `/rlm abort [run_id\|all]` | Cancel active run(s) cooperatively |
144188
| `/harness run "<task>"` | Run tool-using coding harness loop |
145189
| `/rlm replay` | Step through the last run |
146190
| `/rlm chat "<question>"` | Ask the LLM a question about your project |
147191
| `/help` | Show all available commands |
148192

193+
## Cost and Safety Guardrails
194+
195+
Start bounded:
196+
197+
```text
198+
/rlm run "small scoped task" steps=4 timeout=30 budget=60
199+
```
200+
201+
For benchmarks, start with small limits:
202+
203+
```text
204+
/rlm bench preset=dspy_quick limit=1
205+
```
206+
207+
If a run is going out of hand:
208+
209+
```text
210+
/rlm abort all
211+
```
212+
149213
## What You Can Do With It
150214

151215
- **Analyze large documents**: Feed in a 500-page PDF and ask questions — the LLM reads it in chunks via code

docs/core/comparison.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@
55

66
The paradigm comparison module enables side-by-side empirical comparison of different RLM approaches on the same task. It directly addresses the debate around whether RLM provides real benefits over simpler approaches by measuring token usage, cost, execution time, and accuracy.
77

8+
For concept-level guidance on when to use each execution style, see [Execution Patterns](execution-patterns.md).
9+
810
---
911

1012
## Overview

docs/core/environments.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ class RLMEnvironment(Protocol):
4040

4141
## PureRLMEnvironment
4242

43-
The paper-compliant RLM environment implementing exact semantics from "Recursive Language Models" (Zhang, Kraska, Khattab, 2025).
43+
The paper-compliant RLM environment implementing exact semantics from "Recursive Language Models" (2025).
4444

4545
```python
4646
from rlm_code.rlm.pure_rlm_environment import PureRLMEnvironment, PureRLMConfig

docs/core/execution-patterns.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# Execution Patterns
2+
3+
This page describes the three execution patterns available in RLM Code and how to use each one intentionally.
4+
5+
It focuses on behavior and configuration, without opinionated claims.
6+
7+
---
8+
9+
## Why This Matters
10+
11+
RLM Code can operate in multiple modes:
12+
13+
1. **Recursive symbolic context processing** (pure RLM native path)
14+
2. **Tool-delegation coding loop** (harness path)
15+
3. **Direct model response** (single-call baseline)
16+
17+
These modes solve different problems. Comparing them is most useful when you run each one deliberately.
18+
19+
---
20+
21+
## Pattern 1: Recursive Symbolic Context Processing
22+
23+
In this pattern, context is loaded into the REPL as variables and the model works by writing code that:
24+
25+
- Inspects variables programmatically
26+
- Calls `llm_query()` or `llm_query_batched()` from inside code
27+
- Composes intermediate results in variables
28+
- Terminates with `FINAL(...)` or `FINAL_VAR(...)`
29+
30+
### Typical Use Cases
31+
32+
- Large-context analysis
33+
- Programmatic decomposition/map-reduce style reasoning
34+
- Experiments where token efficiency and context handling strategy are primary variables
35+
36+
### Recommended Settings
37+
38+
```bash
39+
/sandbox profile secure
40+
/sandbox backend docker
41+
/sandbox strict on
42+
/sandbox output-mode metadata
43+
```
44+
45+
Then run:
46+
47+
```bash
48+
/rlm run "Analyze this context with programmatic decomposition" env=pure_rlm framework=native
49+
```
50+
51+
Notes:
52+
53+
- `strict on` disables runner-level `delegate` actions in pure mode, so recursion stays inside REPL code.
54+
- `output-mode metadata` keeps per-step output compact and stable for long runs.
55+
56+
---
57+
58+
## Pattern 2: Tool-Delegation Coding Loop (Harness)
59+
60+
In this pattern, the model chooses tools (`read`, `grep`, `edit`, `bash`, MCP tools, etc.) step by step.
61+
62+
### Typical Use Cases
63+
64+
- Repository editing and test-fix loops
65+
- Local/BYOK coding assistant workflows
66+
- MCP-augmented automation
67+
68+
### Commands
69+
70+
```bash
71+
/harness tools
72+
/harness doctor
73+
/harness run "fix failing tests and explain changes" steps=8 mcp=on
74+
```
75+
76+
If a connected model is in local/BYOK mode, TUI chat can auto-route coding prompts to harness.
77+
78+
To disable auto-route for controlled experiments:
79+
80+
```bash
81+
export RLM_TUI_HARNESS_AUTO=0
82+
rlm-code
83+
```
84+
85+
---
86+
87+
## Pattern 3: Direct Model Baseline
88+
89+
This is a simple one-shot baseline without recursive REPL execution or tool loop orchestration.
90+
91+
Use it for sanity checks and benchmark comparison baselines.
92+
93+
---
94+
95+
## Controlled Comparison Workflow
96+
97+
Run the same benchmark suite with each mode:
98+
99+
```bash
100+
/rlm bench preset=paradigm_comparison mode=native
101+
/rlm bench preset=paradigm_comparison mode=harness
102+
/rlm bench preset=paradigm_comparison mode=direct-llm
103+
```
104+
105+
Then compare:
106+
107+
```bash
108+
/rlm bench compare candidate=latest baseline=previous
109+
/rlm bench report candidate=latest baseline=previous format=markdown
110+
```
111+
112+
---
113+
114+
## Mode Selection Checklist
115+
116+
Use **recursive symbolic context processing** when:
117+
118+
- You need code-driven context understanding over large or structured inputs.
119+
- You want recursion written inside code (`llm_query` in loops/functions).
120+
- You want strict experimental control over context exposure.
121+
122+
Use **harness** when:
123+
124+
- Your primary goal is practical coding velocity in a repository.
125+
- You want tool-first workflows (file ops, shell, MCP tools).
126+
127+
Use **direct-llm** when:
128+
129+
- You need a minimal baseline for comparison.
130+
131+
---
132+
133+
## Common Questions
134+
135+
### "Is this just another coding agent?"
136+
137+
RLM Code includes both:
138+
139+
- A **recursive symbolic mode** (`/rlm ... env=pure_rlm framework=native`)
140+
- A **tool-delegation harness mode** (`/harness ...`, or benchmark `mode=harness`)
141+
142+
Because both exist in one product, comparisons should be done with explicit mode selection.
143+
144+
### "If context is hidden, how does the model know what to do?"
145+
146+
The model sees metadata (type/length/preview) and can inspect data via code in REPL, then query sub-models with `llm_query()` / `llm_query_batched()`.
147+
148+
### "How does the run know when it is done?"
149+
150+
Pure recursive runs terminate through `FINAL(...)` or `FINAL_VAR(...)` semantics in REPL flow.
151+
Runner-level completion can also occur from explicit final actions depending on mode.
152+
153+
### "Will recursive sub-calls increase cost?"
154+
155+
Potentially yes. Recursive strategies can reduce prompt bloat but may increase total call count.
156+
This is why RLM Code provides side-by-side benchmark modes (`native`, `harness`, `direct-llm`) for measured tradeoff analysis.
157+
158+
### "Does this hurt caching?"
159+
160+
Caching behavior depends on provider/runtime and prompt evolution.
161+
Use repeated benchmark runs and compare usage/cost metrics in reports instead of assuming one universal caching outcome.
162+
163+
### "Why enforce strict mode in some experiments?"
164+
165+
`/sandbox strict on` disables runner-level delegate actions in pure mode, which helps isolate code-level recursion behavior for cleaner experiments.

docs/core/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Core Engine
22

3-
The Core Engine is the heart of RLM Code. It implements the **Recursive Language Model** paradigm from the research paper by Zhang, Kraska, and Khattab (2025), providing a complete runtime for context-as-variable reasoning, iterative code execution, reward-driven optimization, and multi-paradigm orchestration.
3+
The Core Engine is the heart of RLM Code. It implements the **Recursive Language Model** paradigm from the 2025 research paper, providing a complete runtime for context-as-variable reasoning, iterative code execution, reward-driven optimization, and multi-paradigm orchestration.
44

55
---
66

@@ -39,6 +39,7 @@ The Core Engine is composed of several tightly integrated subsystems:
3939
|---|---|---|
4040
| [Runner](runner.md) | `rlm_code.rlm.runner` | Multi-paradigm orchestrator with trajectory persistence |
4141
| [Environments](environments.md) | `rlm_code.rlm.environments`, `rlm_code.rlm.pure_rlm_environment` | Execution environments with reward profiles |
42+
| [Execution Patterns](execution-patterns.md) | `rlm_code.rlm.runner`, `rlm_code.harness.runner` | How to run pure recursive mode vs harness vs direct baseline |
4243
| [Event System](events.md) | `rlm_code.rlm.events` | Pub-sub event bus for observability and UI |
4344
| [Termination](termination.md) | `rlm_code.rlm.termination` | FINAL/FINAL_VAR termination patterns |
4445
| [Memory Compaction](memory-compaction.md) | `rlm_code.rlm.memory_compaction` | Context window management via summarization |

docs/getting-started/cli.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ Shell shortcuts:
109109
| `/rlm frameworks` | Adapter readiness table |
110110
| `/rlm viz [run_id\|latest]` | Trajectory tree visualization |
111111
| `/rlm status [run_id]` | Run status |
112+
| `/rlm abort [run_id\|all]` | Cooperative cancel for active runs |
112113
| `/rlm replay <run_id>` | Replay stored trajectory |
113114
| `/rlm doctor [env=...] [--json]` | Environment diagnostics |
114115
| `/rlm chat <message> ...` | Persistent RLM chat sessions |
@@ -121,7 +122,7 @@ Shell shortcuts:
121122
| Command | Description |
122123
|---|---|
123124
| `/harness tools [mcp=on\|off]` | List harness tools (local + optional MCP) |
124-
| `/harness doctor` | OpenCode-parity style tool coverage report |
125+
| `/harness doctor` | Harness tool coverage report |
125126
| `/harness run <task> [steps=N] [mcp=on\|off] [tools=name[,name2]]` | Run tool-driven coding loop |
126127

127128
### Sandbox

docs/getting-started/index.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ Welcome to **RLM Code**, the Research Playground and Evaluation OS for Recursive
66

77
## 🧪 What is RLM Code?
88

9-
RLM Code implements the **Recursive Language Model** paradigm from the research paper *"Recursive Language Models"* (Zhang, Kraska, Khattab, 2025). It extends the paper's concepts with:
9+
RLM Code implements the **Recursive Language Model** paradigm from the 2025 *"Recursive Language Models"* paper. It extends the paper's concepts with:
1010

1111
- 🧠 **Context-as-variable**: Context is stored as a REPL variable rather than in the token window, enabling unbounded output and token-efficient processing
1212
- 🔁 **Deep recursion**: Support for recursion depth > 1, exceeding the paper's original limitation
@@ -16,12 +16,26 @@ RLM Code implements the **Recursive Language Model** paradigm from the research
1616

1717
---
1818

19+
## 🎯 Problem Focus
20+
21+
RLM Code is optimized for research workflows where:
22+
23+
- Context is too large to fit comfortably in one prompt.
24+
- You need programmatic inspection and decomposition instead of full-context prompt injection.
25+
- You want to compare recursive symbolic execution against harness-style and direct baselines under the same benchmark suite.
26+
27+
For detailed mode behavior and neutral tradeoff guidance, see [Execution Patterns](../core/execution-patterns.md).
28+
29+
---
30+
1931
## 📚 Where to Go Next
2032

2133
| Guide | Description |
2234
|-------|-------------|
35+
| [🧭 Start Here (Simple)](start-here.md) | Plain-language onboarding: what this is, what to install, and safe first run |
2336
| [📦 Installation](installation.md) | System requirements, package installation, optional dependencies, and verification |
2437
| [⚡ Quick Start](quickstart.md) | Launch the TUI, connect a model, run your first benchmark, explore the Research tab |
38+
| [🧑‍🔬 Researcher Onboarding](researcher-onboarding.md) | Researcher-first workflows and complete command handbook |
2539
| [💻 CLI Reference](cli.md) | Complete reference for the entry point and all 50+ slash commands |
2640
| [⚙️ Configuration](configuration.md) | Full `rlm_config.yaml` schema, environment variables, and ConfigManager API |
2741

0 commit comments

Comments
 (0)