config: default ToolOutputCap to 24576 (C1) — 6× smaller request bodies by FMXExpress · Pull Request #414 · FMXExpress/PasClaw

FMXExpress · 2026-07-02T06:41:55Z

First of the five queued harness-effectiveness PRs (C1 → B1 → B2 → C2+C3 → D1), each verified with before/after numbers from bench/agentloop.

⚠️ Also carries #413's bench commit. #413 merged into its stacked base branch (claude/progress-ledger) instead of main — GitHub only retargets a PR when its base branch is deleted — so the bench never actually landed on main. It's cherry-picked here verbatim as the first commit.

Why

An uncapped tool result rides verbatim in history for the rest of the turn: one fat read_file/grep_files inflates every subsequent request — the top context-growth driver behind the 125 KB request bodies seen on real runs. Oversized histories are also what push Gemini-class models into the malformed-call behaviour in the first place (#406 made recovery work; this attacks the trigger). ToolOutputCap existed but defaulted to 0.

What

Default 0 → DefaultToolOutputCap (24576 ≈ 6K tokens). Normal reads pass verbatim; a runaway result becomes head + tail + a tool_output_get handle (full text always recoverable). Every surface already registers the recovery tool when the cap is on.
ToJSON emits tool_output_cap when ≠ default — including an explicit 0 opt-out, which must round-trip or the next LoadConfig would silently re-enable the cap.
Profiles: stock mirrors the new default; baseline keeps its explicit 0 (the "everything off" A/B floor); low-token 8192 / max-build 16384 stay as explicit choices.
Onboarding: Enter keeps the default, y customises the byte count, n disables.

Harness-verified (the point of this series)

New fat-read bench scenario: a 100 KB read_file at iteration 1, then further iterations — measuring the max request body the loop builds:

	`fatread.max_request_body_bytes`
before (main binary)	129,256
after (this PR)	21,502

6× smaller, and the scenario asserts the capped result carries the tool_output_get retrieval handle. All prior scenarios still green (20 assertions total).

Also

Fixes the stale test-session-stats assertion that has been failing on main ("default is False" — StatsCollectionEnabled flipped to True in config: adopt lean-edit shape for stock defaults + hashline opt-in onboarding #314; the round-trip test now exercises the explicit-off direction).
Known follow-up (pre-existing, out of scope): the TUI's inner loop config doesn't wire ToolOutputCap.

🤖 Generated with Claude Code

https://claude.ai/code/session_01LGQKz579j1ZnDRwmr6h1V6

Generated by Claude Code

Codifies the relay-worker methodology that found the MALFORMED_FUNCTION_CALL death spiral, the workspace/CWD split, and the goal-drift failure: the bench spawns a real gateway against a throwaway PASCLAW_HOME, connects as the relay worker, plays a scripted "model", and asserts on what the LOOP did -- the system prompts it built, what landed on disk, how big the request bodies grew. Fully reproducible, zero API tokens, every envelope inspectable. Pure FPC, no external runtime: TProcess spawns the built binary, the worker thread holds the SSE poll with the same TIdHTTP + custom-TStream frame parser shape as PasClaw.Cmd.Relay, and PasClaw.JSON + PostJSON/GetJSONURL cover the wire. The shipping binary is what gets exercised. Three scenarios (16 assertions): * build-site -- ledger folds into iteration 2 (goal + written file + checklist), iteration 1 pristine (prefix-cache), deliverable exists on disk. * malformed-recovery -- a Gemini-shaped malformed empty turn is retried with the corrective nudge naming a REGISTERED tool, and the turn still delivers. * resume-after-cap -- the 25-iteration notice carries the resume ledger (do-NOT-redo, file names, read counts); a "continue" turn anchors its ledger goal to the original task, not to the word "continue". Metrics print either way -- provider calls to deliverable and max request-body bytes across 25 iterations (24 KB baseline: the number the future context-diet work gets measured against). Harness changes land with before/after numbers, not vibes. `make bench-agentloop` (compiles + runs; needs the built binary; deliberately not part of `make test` since it binds a port). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LGQKz579j1ZnDRwmr6h1V6

An uncapped tool result rides verbatim in history for the rest of the turn: one fat read_file / grep_files inflated EVERY subsequent request -- the top context-growth driver behind the 125 KB request bodies seen on real runs (slower turns, higher cost, and outsized histories are what push Gemini-class models into the malformed-call behaviour in the first place). ToolOutputCap existed but defaulted to 0. * TConfig default 0 -> DefaultToolOutputCap (24576 ~ 6K tokens): generous enough that normal reads pass verbatim, tight enough that a runaway result can't dominate the turn. tool_output_get always recovers the full text, and every surface already registers it when the cap is on (the gates key off Cfg.ToolOutputCap > 0). * ToJSON now emits tool_output_cap when it differs from the default -- including an explicit 0 opt-out, which must round-trip or the next LoadConfig would silently re-enable the cap. * stock profile mirrors the new default; baseline keeps its explicit 0 (the "everything off" A/B floor); low-token 8192 / max-build 16384 stay as their own explicit choices. * Onboarding: Enter keeps the default, y customises, n disables. Harness-verified (bench/agentloop, new fat-read scenario: a 100 KB read at iteration 1, then further iterations): before (main): fatread.max_request_body_bytes = 129256 after (this): fatread.max_request_body_bytes = 21502 (6x) plus asserts the capped result carries the tool_output_get handle. Also: fixes the stale test-session-stats assertion that has been failing on main ("default is False" -- StatsCollectionEnabled defaulted to True in #314's six-free-toggles change; the round-trip test now exercises the explicit-off direction). Includes #413's bench commit: that PR merged into its stacked base branch (claude/progress-ledger) instead of main -- GitHub only retargets when the base branch is deleted -- so the bench never actually landed on main; carried here verbatim. Known follow-up: the TUI's inner loop config doesn't wire ToolOutputCap (pre-existing; its loop caps at 6 iterations and has its own condensers). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LGQKz579j1ZnDRwmr6h1V6

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 84dc17b9f4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-07-02T06:46:08Z

  RenderMarkdown       := True;  { on by default for terminal surfaces; cmd/serve flips off }
  SubagentsEnabled     := True;  { on by default -- a built-in general-purpose subagent makes `spawn` available with no config. }
-  ToolOutputCap        := 0;     { off by default; operators opt in. See TConfig.ToolOutputCap. }
+  ToolOutputCap        := DefaultToolOutputCap; { on by default -- an uncapped tool result rides in history for the whole turn and was the top context-growth driver (bench/agentloop: 100 KB read -> >100 KB request bodies). tool_output_get recovers the full text; operators opt out with tool_output_cap: 0. }


Thread the default cap into TUI loops

With this default flipped on, fresh TUI sessions still do not apply the cap: both TTUI.StartTurn and the line-based HandleUserInput build TToolLoopConfig from Default(TToolLoopConfig) and never assign ToolOutputCap, so RunToolLoop sees 0 and sends oversized tool output verbatim. I checked those TUI loop-construction paths, while Cmd.TUI now registers tool_output_get from the config, so the TUI advertises the recovery tool but the new default protection is not actually enforced on that chat surface.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-07-02T06:46:08Z

+    for i := 1 to GetEnvironmentVariableCount do
+      GW.Environment.Add(GetEnvironmentString(i));
+    GW.Environment.Add('PASCLAW_HOME=' + GHomeDir);


Override PASCLAW_HOME instead of appending duplicates

When the caller already has PASCLAW_HOME exported, this copies that entry into the child environment and then appends a second PASCLAW_HOME; duplicate environment names are not a reliable override, and typical getenv resolution will see the earlier value. In that environment the spawned gateway can ignore the temp relay config and run the benchmark against the user's live PasClaw home/provider/workspace, causing false failures or unintended writes; remove/replace existing PASCLAW_HOME/PASCLAW_CONFIG entries before Execute.

Useful? React with 👍 / 👎.

Two #414 review findings: * The TUI's two loop-config builders never set ToolOutputCap, so with the default flipped on, TUI turns still sent oversized tool output verbatim -- while Cmd.TUI dutifully registered tool_output_get. Both builders now take the cap from RouteConfig, the session- lifetime config snapshot the auto-router already uses. * The bench copied the caller's environment and then APPENDED its PASCLAW_HOME -- duplicate names are not a reliable override, so a caller with PASCLAW_HOME exported could have the spawned gateway run against their LIVE home instead of the throwaway bench home. The copy now skips PASCLAW_HOME / PASCLAW_CONFIG / NO_COLOR before the bench's values are added. Verified: `PASCLAW_HOME=/tmp/decoy make bench-agentloop` passes and the decoy is never created. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01LGQKz579j1ZnDRwmr6h1V6

claude added 2 commits July 2, 2026 06:39

chatgpt-codex-connector Bot reviewed Jul 2, 2026

View reviewed changes

FMXExpress merged commit 1306c02 into main Jul 2, 2026

FMXExpress mentioned this pull request Jul 2, 2026

agent effectiveness: project facts, edit mini-diff, read ranges + dedup, find_files (+ real-task bench) #415

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config: default ToolOutputCap to 24576 (C1) — 6× smaller request bodies#414

config: default ToolOutputCap to 24576 (C1) — 6× smaller request bodies#414
FMXExpress merged 3 commits into
mainfrom
claude/tool-output-cap-default

FMXExpress commented Jul 2, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FMXExpress commented Jul 2, 2026

Why

What

Harness-verified (the point of this series)

Also

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jul 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants