SuperagenticAI
diff --git a/‎README.md‎
Lines changed: 65 additions & 1 deletion b/‎README.md‎
Lines changed: 65 additions & 1 deletion
diff --git a/‎docs/core/comparison.md‎
Lines changed: 2 additions & 0 deletions b/‎docs/core/comparison.md‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎docs/core/environments.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/core/environments.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/core/execution-patterns.md‎
Lines changed: 165 additions & 0 deletions b/‎docs/core/execution-patterns.md‎
Lines changed: 165 additions & 0 deletions
diff --git a/‎docs/core/index.md‎
Lines changed: 2 additions & 1 deletion b/‎docs/core/index.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/getting-started/cli.md‎
Lines changed: 2 additions & 1 deletion b/‎docs/getting-started/cli.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎docs/getting-started/index.md‎
Lines changed: 15 additions & 1 deletion b/‎docs/getting-started/index.md‎
Lines changed: 15 additions & 1 deletion
@@ -2,7 +2,7 @@
 
 **Run LLM-powered agents in a REPL loop, benchmark them, and compare results.**
 
-RLM Code implements the [Recursive Language Models](https://arxiv.org/abs/2502.07503) (RLM) paper by Zhang, Kraska & Khattab. Instead of stuffing your entire document into the LLM's context window, RLM stores it as a Python variable and lets the LLM write code to analyze it — chunk by chunk, iteration by iteration. This is dramatically more token-efficient for large inputs.
+RLM Code implements the [Recursive Language Models](https://arxiv.org/abs/2502.07503) (RLM) approach from the 2025 paper release. Instead of stuffing your entire document into the LLM's context window, RLM stores it as a Python variable and lets the LLM write code to analyze it — chunk by chunk, iteration by iteration. This is dramatically more token-efficient for large inputs.
 
 RLM Code wraps this algorithm in an interactive terminal UI with built-in benchmarks, trajectory replay, and observability.
 
@@ -14,6 +14,13 @@ uv tool install "rlm-code[tui,llm-all]"
 
 This installs `rlm-code` as a globally available command with its own isolated environment. You get the TUI and all LLM provider clients (OpenAI, Anthropic, Gemini).
 
+Requirements:
+
+- Python 3.11+
+- `uv` (recommended) or `pip`
+- one model route (BYOK API key or local server like Ollama)
+- one secure execution backend (Docker recommended; Monty optional)
+
 Don't have uv? Install it first:
 
 ```bash
@@ -115,6 +122,27 @@ After at least two benchmark runs, export a compare report:
 
 Walk through the last run one step at a time — see what code the LLM wrote, what output it got, and what it did next.
 
+### 7. Use RLM Code as a coding agent (local/BYOK/ACP)
+
+RLM Code can also be used as a coding-agent harness in the TUI:
+
+```text
+/harness tools
+/harness run "fix failing tests and add regression test" steps=8 mcp=on
+```
+
+ACP is supported too:
+
+```text
+/connect acp
+/harness run "implement feature X with tests" steps=8 mcp=on
+```
+
+Notes:
+
+- In Local/BYOK connection modes, likely coding prompts in chat can auto-route to harness.
+- In ACP mode, auto-routing is intentionally off; use `/harness run ...` explicitly.
+
 ## How the RLM Loop Works
 
 Traditional LLM usage: paste your document into the prompt, ask a question, hope the model doesn't lose details in the middle.
@@ -129,6 +157,21 @@ RLM approach:
 
 This means the LLM can handle documents much larger than its context window, because it reads them in chunks through code rather than all at once through the prompt.
 
+## What This Is (and Is Not)
+
+RLM Code is:
+
+- a research playground for recursive/model-assisted coding workflows
+- a benchmarking and replay tool for reproducible experiments
+
+RLM Code is not:
+
+- a no-config consumer chat app
+- guaranteed cheap (recursive runs can be expensive)
+- safe to run with unrestricted execution settings
+
+Use secure backend defaults (`/sandbox profile secure`) for normal use.
+
 ## Key Commands
 
 | Command | What it does |
@@ -141,11 +184,32 @@ This means the LLM can handle documents much larger than its context window, bec
 | `/rlm bench preset=<name>` | Run a benchmark preset |
 | `/rlm bench list` | List available benchmarks |
 | `/rlm bench compare` | Compare latest benchmark run with previous run |
+| `/rlm abort [run_id\|all]` | Cancel active run(s) cooperatively |
 | `/harness run "<task>"` | Run tool-using coding harness loop |
 | `/rlm replay` | Step through the last run |
 | `/rlm chat "<question>"` | Ask the LLM a question about your project |
 | `/help` | Show all available commands |
 
+## Cost and Safety Guardrails
+
+Start bounded:
+
+```text
+/rlm run "small scoped task" steps=4 timeout=30 budget=60
+```
+
+For benchmarks, start with small limits:
+
+```text
+/rlm bench preset=dspy_quick limit=1
+```
+
+If a run is going out of hand:
+
+```text
+/rlm abort all
+```
+
 ## What You Can Do With It
 
 - **Analyze large documents**: Feed in a 500-page PDF and ask questions — the LLM reads it in chunks via code
 
@@ -5,6 +5,8 @@
 
 The paradigm comparison module enables side-by-side empirical comparison of different RLM approaches on the same task. It directly addresses the debate around whether RLM provides real benefits over simpler approaches by measuring token usage, cost, execution time, and accuracy.
 
+For concept-level guidance on when to use each execution style, see [Execution Patterns](execution-patterns.md).
+
 ---
 
 ## Overview
 
@@ -40,7 +40,7 @@ class RLMEnvironment(Protocol):
 
 ## PureRLMEnvironment
 
-The paper-compliant RLM environment implementing exact semantics from "Recursive Language Models" (Zhang, Kraska, Khattab, 2025).
+The paper-compliant RLM environment implementing exact semantics from "Recursive Language Models" (2025).
 
 ```python
 from rlm_code.rlm.pure_rlm_environment import PureRLMEnvironment, PureRLMConfig
 
@@ -0,0 +1,165 @@
+# Execution Patterns
+
+This page describes the three execution patterns available in RLM Code and how to use each one intentionally.
+
+It focuses on behavior and configuration, without opinionated claims.
+
+---
+
+## Why This Matters
+
+RLM Code can operate in multiple modes:
+
+1. **Recursive symbolic context processing** (pure RLM native path)
+2. **Tool-delegation coding loop** (harness path)
+3. **Direct model response** (single-call baseline)
+
+These modes solve different problems. Comparing them is most useful when you run each one deliberately.
+
+---
+
+## Pattern 1: Recursive Symbolic Context Processing
+
+In this pattern, context is loaded into the REPL as variables and the model works by writing code that:
+
+- Inspects variables programmatically
+- Calls `llm_query()` or `llm_query_batched()` from inside code
+- Composes intermediate results in variables
+- Terminates with `FINAL(...)` or `FINAL_VAR(...)`
+
+### Typical Use Cases
+
+- Large-context analysis
+- Programmatic decomposition/map-reduce style reasoning
+- Experiments where token efficiency and context handling strategy are primary variables
+
+### Recommended Settings
+
+```bash
+/sandbox profile secure
+/sandbox backend docker
+/sandbox strict on
+/sandbox output-mode metadata
+```
+
+Then run:
+
+```bash
+/rlm run "Analyze this context with programmatic decomposition" env=pure_rlm framework=native
+```
+
+Notes:
+
+- `strict on` disables runner-level `delegate` actions in pure mode, so recursion stays inside REPL code.
+- `output-mode metadata` keeps per-step output compact and stable for long runs.
+
+---
+
+## Pattern 2: Tool-Delegation Coding Loop (Harness)
+
+In this pattern, the model chooses tools (`read`, `grep`, `edit`, `bash`, MCP tools, etc.) step by step.
+
+### Typical Use Cases
+
+- Repository editing and test-fix loops
+- Local/BYOK coding assistant workflows
+- MCP-augmented automation
+
+### Commands
+
+```bash
+/harness tools
+/harness doctor
+/harness run "fix failing tests and explain changes" steps=8 mcp=on
+```
+
+If a connected model is in local/BYOK mode, TUI chat can auto-route coding prompts to harness.
+
+To disable auto-route for controlled experiments:
+
+```bash
+export RLM_TUI_HARNESS_AUTO=0
+rlm-code
+```
+
+---
+
+## Pattern 3: Direct Model Baseline
+
+This is a simple one-shot baseline without recursive REPL execution or tool loop orchestration.
+
+Use it for sanity checks and benchmark comparison baselines.
+
+---
+
+## Controlled Comparison Workflow
+
+Run the same benchmark suite with each mode:
+
+```bash
+/rlm bench preset=paradigm_comparison mode=native
+/rlm bench preset=paradigm_comparison mode=harness
+/rlm bench preset=paradigm_comparison mode=direct-llm
+```
+
+Then compare:
+
+```bash
+/rlm bench compare candidate=latest baseline=previous
+/rlm bench report candidate=latest baseline=previous format=markdown
+```
+
+---
+
+## Mode Selection Checklist
+
+Use **recursive symbolic context processing** when:
+
+- You need code-driven context understanding over large or structured inputs.
+- You want recursion written inside code (`llm_query` in loops/functions).
+- You want strict experimental control over context exposure.
+
+Use **harness** when:
+
+- Your primary goal is practical coding velocity in a repository.
+- You want tool-first workflows (file ops, shell, MCP tools).
+
+Use **direct-llm** when:
+
+- You need a minimal baseline for comparison.
+
+---
+
+## Common Questions
+
+### "Is this just another coding agent?"
+
+RLM Code includes both:
+
+- A **recursive symbolic mode** (`/rlm ... env=pure_rlm framework=native`)
+- A **tool-delegation harness mode** (`/harness ...`, or benchmark `mode=harness`)
+
+Because both exist in one product, comparisons should be done with explicit mode selection.
+
+### "If context is hidden, how does the model know what to do?"
+
+The model sees metadata (type/length/preview) and can inspect data via code in REPL, then query sub-models with `llm_query()` / `llm_query_batched()`.
+
+### "How does the run know when it is done?"
+
+Pure recursive runs terminate through `FINAL(...)` or `FINAL_VAR(...)` semantics in REPL flow.
+Runner-level completion can also occur from explicit final actions depending on mode.
+
+### "Will recursive sub-calls increase cost?"
+
+Potentially yes. Recursive strategies can reduce prompt bloat but may increase total call count.
+This is why RLM Code provides side-by-side benchmark modes (`native`, `harness`, `direct-llm`) for measured tradeoff analysis.
+
+### "Does this hurt caching?"
+
+Caching behavior depends on provider/runtime and prompt evolution.
+Use repeated benchmark runs and compare usage/cost metrics in reports instead of assuming one universal caching outcome.
+
+### "Why enforce strict mode in some experiments?"
+
+`/sandbox strict on` disables runner-level delegate actions in pure mode, which helps isolate code-level recursion behavior for cleaner experiments.
@@ -1,6 +1,6 @@
 # Core Engine
 
-The Core Engine is the heart of RLM Code. It implements the **Recursive Language Model** paradigm from the research paper by Zhang, Kraska, and Khattab (2025), providing a complete runtime for context-as-variable reasoning, iterative code execution, reward-driven optimization, and multi-paradigm orchestration.
+The Core Engine is the heart of RLM Code. It implements the **Recursive Language Model** paradigm from the 2025 research paper, providing a complete runtime for context-as-variable reasoning, iterative code execution, reward-driven optimization, and multi-paradigm orchestration.
 
 ---
 
@@ -39,6 +39,7 @@ The Core Engine is composed of several tightly integrated subsystems:
 |---|---|---|
 | [Runner](runner.md) | `rlm_code.rlm.runner` | Multi-paradigm orchestrator with trajectory persistence |
 | [Environments](environments.md) | `rlm_code.rlm.environments`, `rlm_code.rlm.pure_rlm_environment` | Execution environments with reward profiles |
+| [Execution Patterns](execution-patterns.md) | `rlm_code.rlm.runner`, `rlm_code.harness.runner` | How to run pure recursive mode vs harness vs direct baseline |
 | [Event System](events.md) | `rlm_code.rlm.events` | Pub-sub event bus for observability and UI |
 | [Termination](termination.md) | `rlm_code.rlm.termination` | FINAL/FINAL_VAR termination patterns |
 | [Memory Compaction](memory-compaction.md) | `rlm_code.rlm.memory_compaction` | Context window management via summarization |
 
@@ -109,6 +109,7 @@ Shell shortcuts:
 | `/rlm frameworks` | Adapter readiness table |
 | `/rlm viz [run_id\|latest]` | Trajectory tree visualization |
 | `/rlm status [run_id]` | Run status |
+| `/rlm abort [run_id\|all]` | Cooperative cancel for active runs |
 | `/rlm replay <run_id>` | Replay stored trajectory |
 | `/rlm doctor [env=...] [--json]` | Environment diagnostics |
 | `/rlm chat <message> ...` | Persistent RLM chat sessions |
@@ -121,7 +122,7 @@ Shell shortcuts:
 | Command | Description |
 |---|---|
 | `/harness tools [mcp=on\|off]` | List harness tools (local + optional MCP) |
-| `/harness doctor` | OpenCode-parity style tool coverage report |
+| `/harness doctor` | Harness tool coverage report |
 | `/harness run <task> [steps=N] [mcp=on\|off] [tools=name[,name2]]` | Run tool-driven coding loop |
 
 ### Sandbox
 
@@ -6,7 +6,7 @@ Welcome to **RLM Code**, the Research Playground and Evaluation OS for Recursive
 
 ## 🧪 What is RLM Code?
 
-RLM Code implements the **Recursive Language Model** paradigm from the research paper *"Recursive Language Models"* (Zhang, Kraska, Khattab, 2025). It extends the paper's concepts with:
+RLM Code implements the **Recursive Language Model** paradigm from the 2025 *"Recursive Language Models"* paper. It extends the paper's concepts with:
 
 - 🧠 **Context-as-variable**: Context is stored as a REPL variable rather than in the token window, enabling unbounded output and token-efficient processing
 - 🔁 **Deep recursion**: Support for recursion depth > 1, exceeding the paper's original limitation
@@ -16,12 +16,26 @@ RLM Code implements the **Recursive Language Model** paradigm from the research
 
 ---
 
+## 🎯 Problem Focus
+
+RLM Code is optimized for research workflows where:
+
+- Context is too large to fit comfortably in one prompt.
+- You need programmatic inspection and decomposition instead of full-context prompt injection.
+- You want to compare recursive symbolic execution against harness-style and direct baselines under the same benchmark suite.
+
+For detailed mode behavior and neutral tradeoff guidance, see [Execution Patterns](../core/execution-patterns.md).
+
+---
+
 ## 📚 Where to Go Next
 
 | Guide | Description |
 |-------|-------------|
+| [🧭 Start Here (Simple)](start-here.md) | Plain-language onboarding: what this is, what to install, and safe first run |
 | [📦 Installation](installation.md) | System requirements, package installation, optional dependencies, and verification |
 | [⚡ Quick Start](quickstart.md) | Launch the TUI, connect a model, run your first benchmark, explore the Research tab |
+| [🧑‍🔬 Researcher Onboarding](researcher-onboarding.md) | Researcher-first workflows and complete command handbook |
 | [💻 CLI Reference](cli.md) | Complete reference for the entry point and all 50+ slash commands |
 | [⚙️ Configuration](configuration.md) | Full `rlm_config.yaml` schema, environment variables, and ConfigManager API |