|
| 1 | +# Execution Patterns |
| 2 | + |
| 3 | +This page describes the three execution patterns available in RLM Code and how to use each one intentionally. |
| 4 | + |
| 5 | +It focuses on behavior and configuration, without opinionated claims. |
| 6 | + |
| 7 | +--- |
| 8 | + |
| 9 | +## Why This Matters |
| 10 | + |
| 11 | +RLM Code can operate in multiple modes: |
| 12 | + |
| 13 | +1. **Recursive symbolic context processing** (pure RLM native path) |
| 14 | +2. **Tool-delegation coding loop** (harness path) |
| 15 | +3. **Direct model response** (single-call baseline) |
| 16 | + |
| 17 | +These modes solve different problems. Comparing them is most useful when you run each one deliberately. |
| 18 | + |
| 19 | +--- |
| 20 | + |
| 21 | +## Pattern 1: Recursive Symbolic Context Processing |
| 22 | + |
| 23 | +In this pattern, context is loaded into the REPL as variables and the model works by writing code that: |
| 24 | + |
| 25 | +- Inspects variables programmatically |
| 26 | +- Calls `llm_query()` or `llm_query_batched()` from inside code |
| 27 | +- Composes intermediate results in variables |
| 28 | +- Terminates with `FINAL(...)` or `FINAL_VAR(...)` |
| 29 | + |
| 30 | +### Typical Use Cases |
| 31 | + |
| 32 | +- Large-context analysis |
| 33 | +- Programmatic decomposition/map-reduce style reasoning |
| 34 | +- Experiments where token efficiency and context handling strategy are primary variables |
| 35 | + |
| 36 | +### Recommended Settings |
| 37 | + |
| 38 | +```bash |
| 39 | +/sandbox profile secure |
| 40 | +/sandbox backend docker |
| 41 | +/sandbox strict on |
| 42 | +/sandbox output-mode metadata |
| 43 | +``` |
| 44 | + |
| 45 | +Then run: |
| 46 | + |
| 47 | +```bash |
| 48 | +/rlm run "Analyze this context with programmatic decomposition" env=pure_rlm framework=native |
| 49 | +``` |
| 50 | + |
| 51 | +Notes: |
| 52 | + |
| 53 | +- `strict on` disables runner-level `delegate` actions in pure mode, so recursion stays inside REPL code. |
| 54 | +- `output-mode metadata` keeps per-step output compact and stable for long runs. |
| 55 | + |
| 56 | +--- |
| 57 | + |
| 58 | +## Pattern 2: Tool-Delegation Coding Loop (Harness) |
| 59 | + |
| 60 | +In this pattern, the model chooses tools (`read`, `grep`, `edit`, `bash`, MCP tools, etc.) step by step. |
| 61 | + |
| 62 | +### Typical Use Cases |
| 63 | + |
| 64 | +- Repository editing and test-fix loops |
| 65 | +- Local/BYOK coding assistant workflows |
| 66 | +- MCP-augmented automation |
| 67 | + |
| 68 | +### Commands |
| 69 | + |
| 70 | +```bash |
| 71 | +/harness tools |
| 72 | +/harness doctor |
| 73 | +/harness run "fix failing tests and explain changes" steps=8 mcp=on |
| 74 | +``` |
| 75 | + |
| 76 | +If a connected model is in local/BYOK mode, TUI chat can auto-route coding prompts to harness. |
| 77 | + |
| 78 | +To disable auto-route for controlled experiments: |
| 79 | + |
| 80 | +```bash |
| 81 | +export RLM_TUI_HARNESS_AUTO=0 |
| 82 | +rlm-code |
| 83 | +``` |
| 84 | + |
| 85 | +--- |
| 86 | + |
| 87 | +## Pattern 3: Direct Model Baseline |
| 88 | + |
| 89 | +This is a simple one-shot baseline without recursive REPL execution or tool loop orchestration. |
| 90 | + |
| 91 | +Use it for sanity checks and benchmark comparison baselines. |
| 92 | + |
| 93 | +--- |
| 94 | + |
| 95 | +## Controlled Comparison Workflow |
| 96 | + |
| 97 | +Run the same benchmark suite with each mode: |
| 98 | + |
| 99 | +```bash |
| 100 | +/rlm bench preset=paradigm_comparison mode=native |
| 101 | +/rlm bench preset=paradigm_comparison mode=harness |
| 102 | +/rlm bench preset=paradigm_comparison mode=direct-llm |
| 103 | +``` |
| 104 | + |
| 105 | +Then compare: |
| 106 | + |
| 107 | +```bash |
| 108 | +/rlm bench compare candidate=latest baseline=previous |
| 109 | +/rlm bench report candidate=latest baseline=previous format=markdown |
| 110 | +``` |
| 111 | + |
| 112 | +--- |
| 113 | + |
| 114 | +## Mode Selection Checklist |
| 115 | + |
| 116 | +Use **recursive symbolic context processing** when: |
| 117 | + |
| 118 | +- You need code-driven context understanding over large or structured inputs. |
| 119 | +- You want recursion written inside code (`llm_query` in loops/functions). |
| 120 | +- You want strict experimental control over context exposure. |
| 121 | + |
| 122 | +Use **harness** when: |
| 123 | + |
| 124 | +- Your primary goal is practical coding velocity in a repository. |
| 125 | +- You want tool-first workflows (file ops, shell, MCP tools). |
| 126 | + |
| 127 | +Use **direct-llm** when: |
| 128 | + |
| 129 | +- You need a minimal baseline for comparison. |
| 130 | + |
| 131 | +--- |
| 132 | + |
| 133 | +## Common Questions |
| 134 | + |
| 135 | +### "Is this just another coding agent?" |
| 136 | + |
| 137 | +RLM Code includes both: |
| 138 | + |
| 139 | +- A **recursive symbolic mode** (`/rlm ... env=pure_rlm framework=native`) |
| 140 | +- A **tool-delegation harness mode** (`/harness ...`, or benchmark `mode=harness`) |
| 141 | + |
| 142 | +Because both exist in one product, comparisons should be done with explicit mode selection. |
| 143 | + |
| 144 | +### "If context is hidden, how does the model know what to do?" |
| 145 | + |
| 146 | +The model sees metadata (type/length/preview) and can inspect data via code in REPL, then query sub-models with `llm_query()` / `llm_query_batched()`. |
| 147 | + |
| 148 | +### "How does the run know when it is done?" |
| 149 | + |
| 150 | +Pure recursive runs terminate through `FINAL(...)` or `FINAL_VAR(...)` semantics in REPL flow. |
| 151 | +Runner-level completion can also occur from explicit final actions depending on mode. |
| 152 | + |
| 153 | +### "Will recursive sub-calls increase cost?" |
| 154 | + |
| 155 | +Potentially yes. Recursive strategies can reduce prompt bloat but may increase total call count. |
| 156 | +This is why RLM Code provides side-by-side benchmark modes (`native`, `harness`, `direct-llm`) for measured tradeoff analysis. |
| 157 | + |
| 158 | +### "Does this hurt caching?" |
| 159 | + |
| 160 | +Caching behavior depends on provider/runtime and prompt evolution. |
| 161 | +Use repeated benchmark runs and compare usage/cost metrics in reports instead of assuming one universal caching outcome. |
| 162 | + |
| 163 | +### "Why enforce strict mode in some experiments?" |
| 164 | + |
| 165 | +`/sandbox strict on` disables runner-level delegate actions in pure mode, which helps isolate code-level recursion behavior for cleaner experiments. |
0 commit comments