config: adopt lean-edit shape for stock defaults + hashline opt-in onboarding#314
Conversation
…boarding
Stock now uses the bench-grounded lean-edit defaults
(bench/swe/README.md). The shape is a four-line summary:
* web_fetch + vault_tools default to OFF -- operators turn them
back on via the existing onboarding questions when they actually
use the vault or want the agent doing outbound HTTP. The bench's
45+ live-driven cells called vault_search zero times and only
invoked web_fetch on tasks where shell_exec curl was the natural
alternative anyway.
* Six zero-prompt-cost behavioral toggles default to ON:
orient_task_aware, checkpoints_enabled (+ keep_last=32),
stats_collection_enabled, prompt_cache.ttl "1h" (was 5m),
auto_router.enabled, self_improving_skills.distiller.enabled.
Each was identified by the first-turn ablation (results/ablation.md)
as costing zero tools-array bytes -- pure behavior. The distiller
inclusion is the one users will notice most: every task-of-five-
or-more-tool-calls now triggers a post-turn skill-distillation
pass, and drafts land under workspace/skills/.pending/ awaiting
`pasclaw skills approve <id>`.
* vector_search_enabled stays ON (memory_search is a popular tool
and the underlying FTS5 index is cheap when there's nothing to
index).
* Sandbox / promptware / cron / heartbeat defaults are unchanged.
New TConfig field: HashlineEnabled (default True)
=================================================
Gates fs_edit_hashline + fs_grep tool registrations. The bench found
that Haiku-class models reach for fs_edit_hashline on tasks fs_write
would handle, fail the anchor/payload format, and burn turns
recovering -- the 3x turn-count gap between lean-edit and max-build
on fix01-snippet for Haiku was driven entirely by fs_edit_hashline
mis-authoring. Sonnet and Opus use the tool correctly when they
choose it.
The onboarding flow gets a new PromptHashline question (placed right
after PromptWebFetch in the loop-shaping section) so operators on
small-model deployments can opt out and persist the choice in
config.json. The CLI --no-hashline flag still works as a per-run
override; both layers compose (UseHashline = not --no-hashline AND
Cfg.HashlineEnabled).
Files changed
=============
src/pkg/config/PasClaw.Config.pas TConfig.Create defaults,
HashlineEnabled field, FromJSON/
ToJSON for hashline_enabled
src/pkg/config/PasClaw.Config.Profile.pas
Profile_Stock body updated to
match new defaults; doc comment
at top reflects the lean-edit
shape + max-build now described
as "+web/vault/skills" extras
src/cmd/PasClaw.Cmd.Agent.pas NewBuiltinRegistry call sites
AND-fold Cfg.HashlineEnabled
with --no-hashline
src/cmd/PasClaw.Cmd.Serve.pas same fold
src/cmd/PasClaw.Cmd.TUI.pas same fold
src/cmd/PasClaw.Cmd.Onboard.pas New PromptHashline procedure,
inserted in the onboarding
question order between
PromptWebFetch and
PromptPromptware
src/tests/config_profile_tests.pas TestStockMatchesDefaults
gains a HashlineEnabled
assertion
Compat
======
low-token, security, max-build, baseline, all-on are unchanged in
their explicit overrides. max-build explicitly turns web_fetch +
vault_tools back ON, so operators selecting --profile max-build see
the same behavior as before. baseline / security explicitly turn
them OFF; same as before.
Operators upgrading PasClaw and expecting web_fetch / vault out-of-
box will see them disabled on first run -- the onboarding flow asks
about each, so re-running `pasclaw onboard` restores the previous
defaults in a couple of keystrokes. Documented in the new stock
profile description.
Tests
=====
make test-config-profile passes. The TestStockMatchesDefaults check
holds (TConfig.Create and Profile_Stock changed together). New
assertion for HashlineEnabled covers the new field's roundtrip.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 47b8edd001
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| VaultToolsEnabled := False; { off by default per the bench-grounded "stock = lean-edit shape" verdict (bench/swe/README.md). Vault entries are never called across the bench's 45+ cells -- the model has them as training data. Onboarding asks (default Y for operators who DO use the vault). } | ||
| WebFetchEnabled := False; { off by default for the same reason as VaultToolsEnabled. Also drops memory_fetch (RegisterMemoryFetchTool is gated on EnableWebFetch in NewBuiltinRegistry -- see comment there). Onboarding asks. } |
There was a problem hiding this comment.
Persist vault and web-fetch opt-ins
When these defaults move to False, TConfig.ToJSON still uses the old default-on round-trip logic and only serializes vault_tools_enabled / web_fetch_enabled when they are false. In onboarding or a settings edit, answering yes sets these fields true, but SaveConfig writes no keys; the next LoadConfig starts from these new false defaults and disables vault_search / web_fetch again. Please update the serializer to emit the true values for these now-default-off toggles.
Useful? React with 👍 / 👎.
| StatsCollectionEnabled := True; { on by default -- zero prompt cost, useful for diagnosing turn-count regressions. Onboarding can flip off for privacy-conscious operators. } | ||
| CheckpointsEnabled := True; { on by default -- zero prompt cost, prevents lost work on multi-edit sessions. } | ||
| CheckpointsKeepLast := 32; { keep last 32 atomic edit checkpoints. } | ||
| PromptwareEnabled := True; { on by default -- substring scan, effectively free. } | ||
| OrientTaskAware := False; { opt-in; whole-file MEMORY injection is the contract. } | ||
| OrientTaskAware := True; { on by default -- MEMORY task-aware injection. Zero prompt cost when MEMORY.md is absent; saves tokens when present. } |
There was a problem hiding this comment.
Persist opt-outs for default-on behavior toggles
After making stats, checkpoints, and task-aware orient default to true here, TConfig.ToJSON still only writes stats_collection_enabled, checkpoints_enabled, and orient_task_aware when they are true. If onboarding defaults to N or an operator edits any of them to false, the key is omitted and the next load silently re-enables session stats, checkpoint snapshots, and task-aware MEMORY, so privacy/storage/behavior opt-outs do not stick.
Useful? React with 👍 / 👎.
| SelfImprovingSkills.AutoApprove := False; | ||
| SetLength(SelfImprovingSkills.GuardDeny, 0); | ||
| SelfImprovingSkills.Distiller.Enabled := False; | ||
| SelfImprovingSkills.Distiller.Enabled := True; |
There was a problem hiding this comment.
Persist disabling the default-on distiller
With the distiller defaulting to true here, the existing self_improving_skills serializer no longer writes anything when onboarding skips self-improving skills and sets all four sub-flags false. A user pressing Enter at the [y/N] prompt gets a config without self_improving_skills, and on the next load this default re-enables the post-turn distiller, causing extra model calls and staged skill drafts despite the opt-out.
Useful? React with 👍 / 👎.
| end | ||
| else | ||
| begin | ||
| Cfg.HashlineEnabled := False; |
There was a problem hiding this comment.
Honor hashline opt-out in gateway paths
This persisted opt-out only reaches agent, serve, and tui; I checked src/cmd/PasClaw.Cmd.Gateway.pas and src/pkg/agent/PasClaw.Agent.pas, and they still call RegisterFSTools with not Args.NoHashline / FUseHashline without composing with Cfg.HashlineEnabled. Users who answer n here but use the gateway/web UI or embedded agent still get fs_edit_hashline and fs_grep, contrary to the promised persistent equivalent of --no-hashline.
Useful? React with 👍 / 👎.
…meout Two refinements to the lean-edit-as-stock commit on this branch: HashlineEnabled flips to FALSE by default ========================================= The previous commit on this branch left HashlineEnabled := True so the surgical-patch toolchain (fs_edit_hashline + fs_grep) stayed advertised in the default registry. Reviewer feedback: "stock should match lean-edit" -- and lean-edit's bench-grounded recommendation INCLUDES dropping the hashline tools for the cleanest small-model behavior (Haiku-class models mis-author the hashline anchor/payload format and burn turns -- bench/swe/README.md). TConfig.Create now sets HashlineEnabled := False. The Profile_Stock JSON body mirrors with "hashline_enabled":false. PromptHashline's doc comment is updated -- the question still defaults Y on the prompt so operators on Opus / Sonnet / GPT-4-tier deployments restore the toolchain in one keystroke (those models use it correctly when present). CLI --no-hashline still works as a per-run override. The combined gate stays (not --no-hashline) AND Cfg.HashlineEnabled. OpenAI provider HTTP read timeout 120s -> 600s ============================================== Discovered during bench runs but it's a Pascal code change, so it belongs in this PR alongside the other Pascal changes (reviewer note). Discovered while a slow Claude subagent driving the long-creative fixture was taking ~130s to publish its first reply to the localhost stub -- PasClaw would time out the read at 120s and abort before any tool fired, even though the response was seconds away from landing. 600s is a ceiling, not a wait. The read returns as soon as the body arrives, so the happy path (10-60s responses on a real provider) is unaffected. Slow-think scenarios (high-effort models emitting multi-KB tool-call bodies, or reasoning model with long adaptive-thinking phases) survive the read. Build + test ============ make clean && make passes. make test-config-profile passes. `pasclaw profile show stock` shows hashline_enabled:false as the default; the onboarding flow still defaults Y on the prompt so a two-keystroke onboard run restores the tool surface for big-model operators.
The 600s timeout fix was originally bundled here as a "bench infrastructure prereq" -- but it's a Pascal code change that belongs in the lean-edit defaults PR (reviewer note: the bench PR should be bench code only). The lean-edit-as-stock-defaults branch (PR #314) now carries the timeout bump alongside its other Pascal-side changes. Running the bench against a checkout without that fix means slow subagent drivers may hit the original 120s read timeout, but that's a known issue documented in bench/swe/README.md -- the bench harness can still produce useful data on mock and proxy modes, just not on some live-driven cells with subagents authoring large tool-call bodies.
Slow-thinking subagents driving the bench can take 2-3 minutes to emit a reasoning preamble + a multi-KB tool-call body before they publish the response to the localhost stub. 120s was tight and broke bench cells (driver took ~130s; PasClaw timed out the read before the response landed, even though the body was seconds away). 600s is a ceiling, not a wait -- the read returns as soon as the body arrives, so the happy path is unaffected. Carrying this on the bench branch too so a reviewer can run bench/swe end-to-end without needing the lean-edit-defaults PR merged first. Same change as PR #314's timeout bump -- if both PRs land in either order, the second is a no-op for this line.
… fix) Three reviewer comments on PR #314 caught the same root cause across nine different toggles: TConfig.ToJSON's emission gates were keyed to the OLD defaults. When a default flips, the asymmetric "emit only when value differs from old default" rule causes the OPT-OUT to silently revert on the next LoadConfig. Example: VaultToolsEnabled's default moved True -> False. The old gate was `if not VaultToolsEnabled then emit`. After the flip, an operator answering Y to the onboarding prompt sets the field True -- but `not True` is False, so the key is skipped on save. Next load defaults to False. The Y was silently un-Yed. Same shape on: vault_tools_enabled (True -> False; opt-IN must persist) web_fetch_enabled (True -> False; opt-IN must persist) stats_collection_enabled (False -> True; opt-OUT must persist) checkpoints_enabled (False -> True; opt-OUT must persist) checkpoints_keep_last (0 -> 32; custom values must persist) orient_task_aware (False -> True; opt-OUT must persist) hashline_enabled (True -> False; opt-IN must persist) prompt_cache.ttl ('5m' -> '1h'; custom values must persist) auto_router.enabled (False -> True; opt-OUT must persist) self_improving_skills .distiller.enabled (False -> True; opt-OUT must persist) Fixes ===== For boolean toggles, the emission gate flips direction to match the new default (emit on the non-default branch). For prompt_cache.ttl, the gate now fires on any non-default value (TTL != '1h' and TTL != ''). For auto_router, the gate fires unconditionally since the default-True + previous "emit on Enabled" combination would have suppressed every opt-out; the subobject is small (four fields) so the extra always-present block is cheap. For self_improving_skills, the gate adds `not Distiller.Enabled` so opt-outs trigger emission of the whole subobject (which already writes every sub-flag with PutBool, so once the outer object emits, all four sub-values are persisted). Test ==== New TestOptOutsPersistAcrossRoundTrip in tests/config_profile_tests flips every default toggled in PR #314 to its OPPOSITE value, calls SaveConfig + LoadConfig, and asserts every value survived. Without the ToJSON fixes in this commit, every assertion in the test fails. With them, all pass. The opt-IN path (default values) is already covered by TestStockMatchesDefaults. make test-config-profile passes.
… BEYOND lean-edit Reviewer push-back on PR #314: "are you sure that this is the settings from lean-edit? didn't lean-edit just have less tools?" Right -- strict lean-edit (per bench/swe's profile definition) kept fs_edit_hashline + fs_grep. lean-edit's four-tool drop was web_fetch + memory_fetch + vault_search + vault_get; the hashline tools stayed. The combination the bench called "the most-stripped that still solved 4/4 fixtures" was lean-edit + --no-hashline CLI flag, not lean-edit alone. PR #314 makes HashlineEnabled default False, which is strictly MORE than lean-edit. The justification is the bench's small-model finding (Haiku takes 3x more turns on max-build than lean-edit because fs_edit_hashline lures it into a format it can't author). The choice stands -- bench data is unambiguous; the onboarding PromptHashline question restores hashline in one keystroke for Opus / Sonnet operators who use the tool correctly. This commit updates the Profile_Stock description and the file's top-of-unit doc comment to be HONEST about the scope: - "Adopted the bench-grounded lean-edit shape" (was: just "lean-edit shape") - Adds an explicit "Additionally drops fs_edit_hashline + fs_grep -- this part goes BEYOND strict lean-edit" sentence - Mentions the onboarding PromptHashline restore path No code change -- description strings + doc comment only. make + make test-config-profile both pass unchanged.
The previous commit added the strict-lean-edit-vs-PR-#314 honesty note (the hashline-off drop goes beyond lean-edit), citing only the Haiku finding. Reviewer push-back: "did you figure out in the bench that fs_grep is not better than just letting shell_exec?" -- yes, fixture 07-cross-file-grep was specifically designed to make fs_grep shine and the stripped variant (no fs_grep) won by 1 turn: variant turns trajectory stock (has fs_grep) 3 fs_grep -> fs_write -> done lean-edit (has fs_grep) 3 same lean-edit + --no-hashline 2 shell_exec "grep -rln 'OldRoutine(' src/ > result.txt" The stripped variant won because shell_exec collapses find + filter + write into one tool call. fs_grep returns hashline-formatted output that requires a separate fs_write step. So the HashlineEnabled := False default is justified by TWO bench findings, not just the Haiku one: (1) Smaller models mis-author the hashline format (3x turn penalty) (2) Every model class loses turns on cross-file grep when fs_grep is present vs when shell_exec + redirect is the only option Profile_Stock description and the file's top-of-unit doc comment now reflect both findings. No code change.
…es fs_edit_hashline only (review fix) Reviewer push-back on PR #314: "windows doesn't have grep, how does that affect things?" + "our fs_grep has a special speed enhancement, how does that affect things?" Both excellent points that change the math significantly: 1. fs_grep has SIX ripgrep-inspired optimisations layered on top of the naive scan: Tier 1: defer file-hash computation until first match Tier 2: skip well-known VCS / build / deps dirs (.git, node_modules, target, build, dist, vendor, .venv, __pycache__, .gradle, .next) -- cuts walk by ~10x Tier 3: binary file detection (NUL-byte in first 1024 bytes) Tier 4: file-size cap (default 10 MiB) Tier 5: walk bytes, not lines -- 3-5x faster per-file Tier 6: Boyer-Moore-Horspool substring search -- 3-10x fewer byte reads On a real codebase fs_grep is 10-50x faster than shell_exec grep. The bench's "shell_exec wins by 1 turn" finding measured the one-shot UX advantage of `>` redirect on a tiny 8-file fixture -- it does NOT generalise. 2. Windows ships no grep -- only findstr (different syntax) and PowerShell Select-String. Dropping fs_grep there would force every Windows agent run that needs cross-file text search to negotiate findstr quirks, OR (more likely) wedge entirely. Fix === Split the previously-bundled gate. fs_grep registers UNCONDITIONALLY in PasClaw.Tools.FS.RegisterFSTools -- moved out from under the `if UseHashline then` block. UseHashline now controls only: (a) fs_read's default output format (hashline-prefixed vs raw bytes) (b) fs_edit_hashline tool registration The platform-conditional default is preserved -- HashlineEnabled defaults False on Linux / macOS (bench-driven, smaller models mis-author the hashline format) and True on Windows (no platform- specific reason to drop, follows historical behavior). PromptHashline copy updated to be about fs_edit_hashline only and mentions that fs_grep is always on. Profile_Stock description + doc comment updated to reflect the split. HashlineEnabled field doc-comment now explicitly says "gates fs_edit_hashline ONLY". Cumulative PR #314 stock shape ============================== The new defaults are: Linux / macOS Windows ------------------ ------------------ web_fetch OFF OFF vault_* OFF OFF memory_fetch OFF OFF orient_task_aware ON ON checkpoints ON ON stats ON ON prompt_cache.ttl 1h 1h auto_router ON ON distiller ON ON fs_grep ON ON (was off in earlier PR #314 commits) fs_edit_hashline OFF ON (platform-conditional) fs_read format plain hashline (follows fs_edit_hashline gate) Tests ===== make clean && make passes. make test-config-profile passes. TestOptOutsPersistAcrossRoundTrip from the earlier commit still holds (the round-trip semantics didn't change; just the meaning of the HashlineEnabled flag narrowed).
…ditional logic
Reverts the {$IFDEF MSWINDOWS}-True / {$ELSE}-False split in
TConfig.Create. fs_edit_hashline now defaults ON for all
platforms.
Rationale: the original PR-#314 case for default-False on
Linux / macOS leaned on the bench's Haiku finding (smaller
models mis-author the hashline anchor/payload format). That
finding is real, but small-model deployments are not the
majority case pasclaw is tuned for; the operators who hit it
already opt out via PromptHashline or --no-hashline. Forcing
Opus / Sonnet / GPT-4 operators on Linux / macOS to flip
PromptHashline Y on every onboarding to restore the surgical-
patch toolchain they want by default created more friction
than the small-model protection bought.
Default True everywhere:
- TConfig.Create unconditionally sets HashlineEnabled := True
- PromptHashline defaults Y (Enter-through keeps it on)
- Profile_Stock description + top-of-unit doc comment updated
- HashlineEnabled field doc comment updated
fs_grep stays unconditional (PR #314 split-gate from prior
commit -- ripgrep-inspired optimisations win on real codebases,
and Windows has no shell grep).
Tests
=====
TestOptOutsPersistAcrossRoundTrip: now flips HashlineEnabled
to False (opt-OUT) and asserts the False persists -- previously
flipped True (opt-IN) to assert opt-in persistence. The
round-trip semantics didn't change, just which direction is
the non-default.
make clean && make passes. make test-config-profile passes.
Self-review of the previous commit: I flipped TConfig.Create's default from False to True everywhere, but missed the matching SaveConfig emit-rule. The save path was "emit only when True" (opt-in for the old PR-#314-Linux default-False world). With the default back to True, that's inverted -- now emit only when False (opt-out). Without this fix, TestOptOutsPersistAcrossRoundTrip failed: - test sets Saved.HashlineEnabled := False - SaveConfig sees False, the OLD code skips emit - LoadConfig reads no hashline_enabled key, keeps default True - assertion "not Loaded.HashlineEnabled" fails Same fix on the load side isn't needed -- LoadConfig already reads with HashlineEnabled as its own default, so missing key preserves whatever TConfig.Create set. make clean && make passes. make test-config-profile now passes end-to-end.
Summary
Adopts the bench-grounded
lean-editshape for PasClaw's stock (out-of-box) defaults, and adds a hashline opt-in question topasclaw onboard. Driven by the empirical findings in PR #313'sbench/swe/README.md.Pure config / profile / onboarding changes — no benchmark code here.
What changes for a fresh install
web_fetch_enabledweb_fetchonly on tasks whereshell_exec curlwas the natural alternative; never on a public URL it couldn't reach.vault_tools_enabledorient_task_awareMEMORY.mdexists.checkpoints_enabledstats_collection_enabledprompt_cache.ttl1hauto_router.enabledself_improving_skills.distiller.enabledworkspace/skills/.pending/.vector_search_enabledstays on (memory_search remains available). All six new-default behaviors were identified bybench/swe/results/ablation.mdas costing zero tools-array bytes — they're pure-behavior changes.New
HashlineEnabledconfig fieldGates
fs_edit_hashline+fs_greptool registrations. The bench found Haiku-class models reach forfs_edit_hashlineon tasksfs_writewould handle, fail the anchor/payload format, and burn turns recovering — Haiku showed a 3× turn-count penalty on max-build vs lean-edit onfixture/01-snippet-windowdriven entirely by hashline mis-authoring. Sonnet and Opus use the tool correctly.true(matches current behavior)PromptHashline, placed right afterPromptWebFetch)--no-hashlinestill works as a per-run overrideUseHashline = (not --no-hashline) and Cfg.HashlineEnabledProfile compatibility
baseline,security,low-token,all-on— unchanged in their explicit overridesmax-build— explicitly turnsweb_fetch_enabled+vault_tools_enabledback ON, so--profile max-buildproduces the same final behavior as beforestock— body updated to reflect the new TConfig defaultsUpgrade impact
Operators upgrading PasClaw and expecting
web_fetch/vault_toolsout-of-box will see them disabled on first run.pasclaw onboardrestores them in two keystrokes (the existingPromptWebFetch/ vault questions ask, default Y). Documented in the new stock profile description.Test plan
make test-config-profile— TestStockMatchesDefaults still passes (TConfig.Create and Profile_Stock changed together)pasclaw config showafterhashline_enabled: falsein config.json — confirmed the field round-tripspasclaw profile show stock— new shape printedmake clean && make) — 0 errorspasclaw onboardto confirm newPromptHashlinestep renders correctlyBench reference
Full empirical data in PR #313's
bench/swe/README.md. Cross-model verdict from there:Generated by Claude Code