perf: universal tool-output ceiling with spill + async post-edit diagnostics by kevincodex1 · Pull Request #518 · Gitlawb/zero

kevincodex1 · 2026-07-05T13:14:54Z

Summary

Two structural fixes from the perf/token audit vs opencode — the biggest
recurring token leak and the biggest recurring latency tax in the loop.

Universal token-based output ceiling (`1825ea4`)

Every tool result is now capped at the registry boundary — 16k estimated
tokens by default, ZERO_TOOL_OUTPUT_CEILING_TOKENS-tunable, 0 disables —
unless the tool manages its own deliberate budget. Truncation keeps head+tail
and spills the full (already redaction-scrubbed) output to the existing spill
dir, re-readable via grep/read_file, so nothing is lost — only deferred.

This closes the three paths a single call could flood the context window
through (and then re-bill on every following turn):

web_fetch: up to 256KiB default / 2MiB max of non-HTML body, previously unbudgeted
skill: full SKILL.md returned uncapped
MCP-served tools: no budget at all

Self-budgeting tools opt out via an unexported marker interface (an MCP
server can never exempt itself): bash, exec_command (model-raisable),
read_file, read_minified_file, grep, glob, list_directory. Browser
snapshots deliberately stay under the net.

bash itself is tightened to match the unified token model: emit budget
96KiB → 32KiB per stream (~24k → ~8k tokens worst case per stream), while
capture retention stays at 96KiB head+tail — so the new sectioned bash spill
covers 6× more of the output than the transcript shows, and the spill path
also lands in meta["spill_path"].

Async post-edit LSP diagnostics (`b227599`)

Inline diagnostics blocked every edit_file/write_file result on the
language server: a ≥300ms quiet-period debounce that reset on each publish,
seconds on the first edit per language, capped at 10s — paid on every edit
of every session.

Collection is now asynchronous: a file is enqueued the moment a mutating
tool finishes, a single worker checks it while the rest of the batch (and
any self-correct verification) runs, and the loop drains completed results
just before building the NEXT provider request, appending errors as a
user-role nudge. The model always issues another request after an edit turn
— it has to read its tool results — so it still sees introduced errors at
the same decision point as before, with no tool call ever stalling on the
language server.

Consistency: a file re-edited before its check runs is checked once in its
final state (the checker re-reads the file); re-edited after, it re-queues
and the newer result wins. A drain that finds the worker still busy gives up
after 3s and delivers on a later turn — results stay accurate because any
further edit re-enqueues the file. The tools-side inline mechanism
(tools.RunOptions.Diagnostics) is kept for direct API callers; the loop
just no longer wires it.

Checklist

The linked issue already has the issue-approved label.
No linked issue — core-team change, same waiver as feat: agent quality, caching, retry, and tooling upgrades #506.
go build ./..., go vet ./..., and go test ./... pass locally.
Build and vet clean. Full suite passes except the 6 pre-existing
internal/cli failures from the local opengateway provider profile
("openai provider opengateway requires official baseURL") —
byte-identical failure set on main, unrelated to this change.
gofmt clean.
Tests added/updated for the change (and run under -race where relevant).
Registry-level ceiling tests (cap + spill content, self-budgeting
exemption, env override, small-output passthrough), bash spill tests,
async-collector unit tests (deferred delivery on slow checks, re-edit
latest-wins, nil no-op), and an end-to-end Run test proving the
diagnostics nudge lands in the follow-up request and never leaks into
the tool result. internal/tools and internal/agent (all new
concurrency lives there) pass under -race.
UI changes include screenshots or a short recording where possible.
N/A — no UI changes; TUI/CLI diffs are comment-only.

Summary by CodeRabbit

New Features
- Added a universal per-call tool output ceiling (env-configurable); when exceeded, output is truncated and the full redacted transcript is saved with spill hints for later review.
- File diagnostics now run in the background after edits and are delivered as a “nudge” message on the next request (including the final turn).
Bug Fixes
- Repeated edits refresh diagnostics to avoid stale results.
- Late diagnostics are no longer dropped during run finalization.
- Spill files are further protected by an additional secrets redaction pass.

github-actions · 2026-07-05T13:19:21Z

Zero automated PR review

Verdict: No blockers found

Blockers

None found.

Validation

[pass] Diff hygiene: git diff --check
[pass] Tests: go test ./...
[pass] Build: go run ./cmd/zero-release build
[pass] Smoke build: go run ./cmd/zero-release smoke

Scope

Head: 07af8c451d9c
Changed files (13): internal/agent/async_diagnostics.go, internal/agent/async_diagnostics_test.go, internal/agent/loop.go, internal/agent/types.go, internal/cli/exec.go, internal/tools/bash.go, internal/tools/bash_budget_test.go, internal/tools/output_ceiling.go, internal/tools/output_ceiling_test.go, internal/tools/registry.go, internal/tools/spill.go, internal/tools/spill_test.go, and 1 more

This deterministic review checks validation status and basic diff hygiene. A human reviewer still owns product judgment and design quality.

coderabbitai · 2026-07-05T13:22:19Z

Walkthrough

The agent now queues changed files for background diagnostics and injects the resulting nudge on later turns instead of running file diagnostics inline. Tool output handling also adds a universal ceiling, self-budgeting exemptions, bash spill files with preserved sections, and spill-file secret redaction.

Changes

Async post-edit diagnostics

Layer / File(s)	Summary
Async diagnostics flow `internal/agent/async_diagnostics.go`, `internal/agent/loop.go`, `internal/agent/types.go`, `internal/cli/exec.go`, `internal/tui/model.go`	Background diagnostics are queued from mutating tool results, drained at turn boundaries and before final answers, and no longer forwarded inline into tool execution; related comments are updated to describe the new flow.
Async diagnostics tests `internal/agent/async_diagnostics_test.go`	Unit and end-to-end tests cover nil collectors, queue draining, slow checks, replacement of stale results, and delivery across normal, max-turn, and late finalization flows.

Tool output ceiling and spill handling

Layer / File(s)	Summary
Bash capture spill handling `internal/tools/bash.go`, `internal/tools/bash_budget_test.go`	Bash output budgeting now retains a larger capture buffer, spills truncated stdout/stderr to disk, and tests validate the spill file contents and capture-gap markers.
Spill redaction hardening `internal/tools/spill.go`, `internal/tools/spill_test.go`	Spill files now receive an extra pattern-based secret redaction pass before being written, and tests confirm secrets are removed from the saved payload.
Universal output ceiling `internal/tools/output_ceiling.go`, `internal/tools/registry.go`, `internal/tools/output_ceiling_test.go`	Registry result handling now applies a configurable output ceiling after secret scrubbing, skips self-budgeting tools, and tests cover truncation, exemption, env overrides, and spill-file behavior.

Estimated code review effort: 4 (Complex) | ~60 minutes

Possibly related PRs

Gitlawb/zero#146: Also changes internal/agent/loop.go tool execution control and request escalation logic.
Gitlawb/zero#175: Also updates internal/agent/loop.go changed-files batching and post-edit turn handling.
Gitlawb/zero#506: Touches the same diagnostics wiring and Options.FileDiagnostics flow in the agent loop.

Suggested reviewers: Vasanthdev2004, gnanam1990, jatmn

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the two main changes: a universal tool-output ceiling with spilling and async post-edit diagnostics.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/perf-token-wave1

_{Comment @coderabbitai help to get the list of available commands.}

coderabbitai

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/tools/bash_budget_test.go (1)
119-146: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Add spill-path assertions to TestBudgetBashCaptureReportsTrueTotal. budgetBashCapture also writes meta["spill_path"] on truncation, so this test should exercise the spill file like TestBudgetBashOutputTruncatesHeadAndTail does.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/tools/bash_budget_test.go` around lines 119 - 146,
`TestBudgetBashCaptureReportsTrueTotal` currently verifies truncation metadata
but does not assert the spill-file path behavior that `budgetBashCapture` sets
when output is oversized. Update this test to also check `meta["spill_path"]`
and validate the spill contents the same way
`TestBudgetBashOutputTruncatesHeadAndTail` does, using `budgetBashCapture`,
`meta`, and the spill-path handling to confirm the full retained output is
written to disk on truncation.

🧹 Nitpick comments (1)

internal/tools/output_ceiling_test.go (1)
123-130: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Cover the read_minified_file exemption.

output_ceiling.go exempts readMinifiedFileTool at Line 41, but this table does not assert it. Add the read-minified tool here so removing that opt-out fails the exemption-list test.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/tools/output_ceiling_test.go` around lines 123 - 130, The
exemption-list test is missing the read-minified-file opt-out, so it won’t fail
if that special case is removed. Update the exempt tool table in the output
ceiling test to include the read-minified-file tool alongside the existing Tool
entries, using the same constructor/pattern as the other exemptions so the test
covers the readMinifiedFileTool exemption in output_ceiling.go.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/agent/loop.go`:
- Around line 168-177: The post-edit diagnostics are only drained at the start
of the next loop iteration in loop, so when maxTurns is reached the finalization
path in finalAnswerAfterMaxTurns can miss newly queued LSP errors. Move or
duplicate the postEditDiagnostics.drain(ctx) handling so diagnostics are
appended before the max-turn final answer request is made, ensuring the final
provider call sees the latest diagnostics.

In `@internal/tools/bash_budget_test.go`:
- Line 33: The spill temp-dir override in the budget test is Unix-only because
the code under test uses os.TempDir()/os.CreateTemp, so Windows may still spill
into the real temp directory. Update the test setup around the existing t.Setenv
call to override TMPDIR, TMP, and TEMP together, or switch the spill path setup
in internal/tools/spill.go to accept an injected root so the test can control it
cross-platform.

In `@internal/tools/bash.go`:
- Around line 407-419: The spill file written by spillBashStreams currently
concatenates the retained head and tail from boundedBuffer.retained() without
indicating that the middle was dropped. Update spillBashStreams so the saved
bash spill content includes a clear omission marker between the retained
sections, and make sure the same marker is used wherever the spill content is
assembled for outText/errText so readers can tell the capture was truncated.

---

Outside diff comments:
In `@internal/tools/bash_budget_test.go`:
- Around line 119-146: `TestBudgetBashCaptureReportsTrueTotal` currently
verifies truncation metadata but does not assert the spill-file path behavior
that `budgetBashCapture` sets when output is oversized. Update this test to also
check `meta["spill_path"]` and validate the spill contents the same way
`TestBudgetBashOutputTruncatesHeadAndTail` does, using `budgetBashCapture`,
`meta`, and the spill-path handling to confirm the full retained output is
written to disk on truncation.

---

Nitpick comments:
In `@internal/tools/output_ceiling_test.go`:
- Around line 123-130: The exemption-list test is missing the read-minified-file
opt-out, so it won’t fail if that special case is removed. Update the exempt
tool table in the output ceiling test to include the read-minified-file tool
alongside the existing Tool entries, using the same constructor/pattern as the
other exemptions so the test covers the readMinifiedFileTool exemption in
output_ceiling.go.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: afc0b4df-726c-4dba-9b5b-5d1b2993cb3d

📥 Commits

Reviewing files that changed from the base of the PR and between 30bff28 and b227599.

📒 Files selected for processing (11)

internal/agent/async_diagnostics.go
internal/agent/async_diagnostics_test.go
internal/agent/loop.go
internal/agent/types.go
internal/cli/exec.go
internal/tools/bash.go
internal/tools/bash_budget_test.go
internal/tools/output_ceiling.go
internal/tools/output_ceiling_test.go
internal/tools/registry.go
internal/tui/model.go

Vasanthdev2004 · 2026-07-05T13:55:29Z

@kevincodex1 can u able to fix coderabbit changes ?

Cap every tool result at the registry boundary (16k estimated tokens, ZERO_TOOL_OUTPUT_CEILING_TOKENS-tunable, 0 disables) unless the tool manages its own deliberate budget. Head+tail truncation, full output spilled to the existing redacted spill dir and re-readable via grep/read_file, so nothing is lost — only deferred. This closes the three unbounded paths a single call could flood the context window through (and then re-bill every following turn): - web_fetch: up to 256KiB default / 2MiB max of non-HTML body - skill: full SKILL.md returned uncapped - MCP-served tools: no budget at all Self-budgeting tools opt out via an unexported marker interface (an MCP server can never exempt itself): bash, exec_command (model-raisable), read_file, read_minified_file, grep, glob, list_directory. Browser snapshots deliberately stay under the net. bash itself is tightened to match the unified token model: emit budget 96KiB -> 32KiB per stream (~24k -> ~8k tokens worst case per stream), while capture retention stays at 96KiB head+tail so the new spill file covers 6x more of the output than the transcript shows.

Inline post-edit diagnostics blocked every edit_file/write_file result on the language server: a >=300ms quiet-period debounce that reset on each publish, seconds on the first edit per language, capped at 10s — the largest recurring latency tax in an edit-heavy session. Diagnostics are now collected in the background: a file is enqueued the moment a mutating tool finishes, a single worker checks it while the rest of the batch executes, and the loop drains completed results just before building the NEXT provider request, appending errors as a user nudge. The model always issues another request after an edit turn (to read its tool results), so it still sees introduced errors at the same decision point — no tool call ever stalls on the language server. A file re-edited before its check runs is checked once in its final state (the checker re-reads the file); re-edited after, it is re-queued and the newer result wins. A drain that finds the worker still busy gives up after 3s and delivers on a later turn — results stay accurate because any further edit re-enqueues the file. The tools-side inline mechanism (tools.RunOptions.Diagnostics) is kept for direct API callers; the loop just no longer wires it.

…, test hardening - Drain background diagnostics before the max-turns final-answer request so an error introduced by the LAST turn's edit is reported in the summary instead of silently dropped (+ regression test). - spillBashStreams marks the head/tail junction with a capture-gap marker when boundedBuffer dropped the middle of a stream, so a spilled log never reads as contiguous when it is not; junction position is asserted in the test. - Tests override TMP/TEMP alongside TMPDIR (os.TempDir reads TMP/TEMP on Windows) via a shared setTestTempDir helper. - TestBudgetBashCaptureReportsTrueTotal now asserts spill_path meta and spill file content; exemption-list test gains read_minified_file.

coderabbitai

🧹 Nitpick comments (1)

internal/agent/async_diagnostics_test.go (1)
75-96: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Loose timing assertion, but functionally sufficient.

The elapsed > time.Second check only catches gross blocking regressions, not the intended asyncDiagnosticsDrainTimeout-bound behavior. A tighter bound (e.g., a few multiples of the 20ms timeout) would catch subtler regressions, but not essential given the deferred-delivery loop below also validates correctness.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/agent/async_diagnostics_test.go` around lines 75 - 96, The timing
assertion in async diagnostics draining is too loose and should be tightened to
verify the intended asyncDiagnosticsDrainTimeout behavior. In
async_diagnostics_test.go, update the drain duration check in the async
diagnostics test that uses diagnostics.drain and release so it fails on a
bounded multiple of the shortened timeout rather than waiting up to a full
second, while keeping the deferred-delivery loop that validates the eventual
nudge from drain.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/agent/async_diagnostics_test.go`:
- Around line 75-96: The timing assertion in async diagnostics draining is too
loose and should be tightened to verify the intended
asyncDiagnosticsDrainTimeout behavior. In async_diagnostics_test.go, update the
drain duration check in the async diagnostics test that uses diagnostics.drain
and release so it fails on a bounded multiple of the shortened timeout rather
than waiting up to a full second, while keeping the deferred-delivery loop that
validates the eventual nudge from drain.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7df9fbb7-7b4d-4c1d-98df-7d3073686992

📥 Commits

Reviewing files that changed from the base of the PR and between b227599 and 4e74b65.

📒 Files selected for processing (11)

internal/agent/async_diagnostics.go
internal/agent/async_diagnostics_test.go
internal/agent/loop.go
internal/agent/types.go
internal/cli/exec.go
internal/tools/bash.go
internal/tools/bash_budget_test.go
internal/tools/output_ceiling.go
internal/tools/output_ceiling_test.go
internal/tools/registry.go
internal/tui/model.go

✅ Files skipped from review due to trivial changes (3)

internal/tui/model.go
internal/agent/types.go
internal/cli/exec.go

🚧 Files skipped from review as they are similar to previous changes (6)

internal/tools/registry.go
internal/tools/output_ceiling.go
internal/agent/async_diagnostics.go
internal/tools/output_ceiling_test.go
internal/tools/bash.go
internal/agent/loop.go

filepath.IsAbs("/ws") is false on Windows (no drive letter), so the collector test's workspace-absolute assertion failed on the windows smoke job. Root the fake workspace at t.TempDir() — absolute on every platform — and assert against it.

Vasanthdev2004

Dug into this hard — it's a strong perf change and the output-ceiling half is genuinely clean, but three things to fix before it lands, one of them a real secret leak.

[blocker — security] The bash spill bypasses the high-confidence secret scrub. In budgetBashCapture, spillBashStreams(out, ..., errStr, ...) (bash.go:408) gets the RAW retained streams and spills them before formatBashOutput runs secrets.Redact (bash.go:353-354) on the transcript. spillTruncatedOutput only re-applies RedactString — the configured-key scrub — not secrets.Redact's pattern matcher (AWS keys, GitHub PATs, Slack/OpenAI keys, JWTs). So a credential that bash redacts from the transcript sits in cleartext in the spill, and the notice explicitly tells the model to grep/read_file it. That's exactly the leak secrets.Redact exists to prevent, reopened through the new spill path. Fix: run secrets.Redact on the streams before spilling (in spillBashStreams, or have spillTruncatedOutput apply it).

[major — correctness] The final edit's diagnostics can be silently dropped — a regression from the inline path. The drain only runs at the top of the loop (loop.go:172, asyncDiagnosticsDrainTimeout = 3s) and in the max-turns path (705). The natural no-tool-call completion at loop.go:494 returns the final answer with no drain. So on a cold/large-repo LSP with a single-edit task: the model edits, the check is enqueued and allowed up to 10s (fileDiagnosticsTimeout), the next turn's top-of-loop drain gives up after 3s, and if the model completes that turn the introduced error is never surfaced. The inline path always blocked until diagnostics were ready, so it always showed them before the final decision — this is the exact scenario the feature targets. Simplest fix: drain at the completion boundary before returning the final answer, and if it turns up errors, inject the nudge and continue one more turn instead of finishing.

[blocker — CI] Windows smoke is red. I ran it here (I'm on Windows): TestAsyncDiagnosticsCollectsAndDrainsOnce fails because it uses "/ws" as the workspace root, which isn't Windows-absolute — filepath.IsAbs("\ws\broken.go") is false on Windows, so the assertion trips. It's a test-portability issue, not a production bug (absPath is correct for a real absolute root), but it's red CI. Use t.TempDir() for the root instead of "/ws".

The rest holds up — I verified the ceiling directly: MCP/external tools genuinely can't exempt themselves (the marker is an unexported method and they're in a different package, so Go won't let them satisfy it), redaction runs before the ceiling spill, truncation is rune-safe and keeps the failure tail, only Output is capped so structured Meta survives, and the async concurrency itself is race/leak/deadlock-free with correct re-edit-latest-wins. That part's solid.

Fix the three and I'll take another pass.

…ostics Review follow-ups (Vasanthdev2004): - Spill files now pass the pattern-based secret scanner (AWS keys, GitHub/Slack/OpenAI tokens, PEM blocks, JWTs) in addition to the configured-key scrub. A bash spill runs before formatBashOutput's scan, so without this a spilled file held in cleartext exactly the credentials the transcript redacts. Applied centrally in spillTruncatedOutput, covering bash, exec_command, and ceiling spills. - Finalization diagnostics gate: a no-tool-call final answer now drains pending post-edit diagnostics with the full inline-era budget (10s, vs the 3s per-turn wait) before the run is accepted; introduced errors append the nudge and give the model one more turn instead of being silently dropped when no later turn exists. The max-turns summary path uses the same finalization budget. Free for runs that never edited — an idle collector returns immediately.

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

internal/agent/loop.go (1)
623-631: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Avoid duplicate LSP nudges when SelfCorrect is enabled

SelfCorrect.AfterEdit and FileDiagnostics both surface error-severity diagnostics for the same changed files, so enabling both on the same run repeats the same LSP error to the model. Deduplicate one path or disable FileDiagnostics while SelfCorrect is active.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/agent/loop.go` around lines 623 - 631, The current edit path in
loop.go enqueues FileDiagnostics for every successful mutating tool even when
SelfCorrect is also active, causing duplicate error-severity LSP nudges for the
same changed files. Update the post-edit flow around toolResult.Status OK and
changedFilesThisBatch so that only one source of diagnostics is used when
SelfCorrect.AfterEdit is enabled—either skip postEditDiagnostics.enqueue for
those files or gate FileDiagnostics off in that mode. Make the change in the
logic that handles changedFilesThisBatch and postEditDiagnostics so the same
file set is not reported twice.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@internal/agent/loop.go`:
- Around line 623-631: The current edit path in loop.go enqueues FileDiagnostics
for every successful mutating tool even when SelfCorrect is also active, causing
duplicate error-severity LSP nudges for the same changed files. Update the
post-edit flow around toolResult.Status OK and changedFilesThisBatch so that
only one source of diagnostics is used when SelfCorrect.AfterEdit is
enabled—either skip postEditDiagnostics.enqueue for those files or gate
FileDiagnostics off in that mode. Make the change in the logic that handles
changedFilesThisBatch and postEditDiagnostics so the same file set is not
reported twice.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1ef5bfc0-b123-4e4e-811a-5c03c98378ca

📥 Commits

Reviewing files that changed from the base of the PR and between fc89f2a and 07af8c4.

📒 Files selected for processing (5)

internal/agent/async_diagnostics.go
internal/agent/async_diagnostics_test.go
internal/agent/loop.go
internal/tools/spill.go
internal/tools/spill_test.go

🚧 Files skipped from review as they are similar to previous changes (1)

internal/agent/async_diagnostics.go

Vasanthdev2004 · 2026-07-05T15:19:18Z

@kevincodex1 doing

Vasanthdev2004

Re-reviewed the fixes — all three are addressed, and I checked each against the code rather than the commit messages. Approving.

The bash spill leak is closed. spillTruncatedOutput now runs both RedactString and secrets.Redact before writing (spill.go:115-116), and since every spill — bash and the universal ceiling — routes through it, the high-confidence patterns (AWS keys, GitHub PATs, JWTs) are scrubbed from the file the model reads back, not just from the transcript. That was the important one.

The finalization gate is right. Before returning the final answer, the completion path calls drainFinal (loop.go:500), which waits the full inline-era budget once rather than the 3s per-turn timeout; if diagnostics surface an error it appends the nudge and continues one more turn instead of finalizing past it, and only an empty/idle result finalizes. The max-turns path gets the same treatment (718). So the final edit's diagnostics can't be silently dropped anymore — the model still sees an introduced error before it's done, which was the whole point.

Windows is green. The async test uses a real workspace root now; I re-ran the agent + tools suites here on Windows and they pass, and CI is green across all nine checks.

Everything I flagged is handled, and the parts I'd already verified clean — the ceiling boundary, MCP tools genuinely not being able to self-exempt, and the async concurrency itself (race/leak/deadlock-free, latest-wins) — still hold. Good turnaround; this is a solid perf change now. Good to merge.

coderabbitai Bot requested changes Jul 5, 2026

View reviewed changes

Comment thread internal/agent/loop.go

Comment thread internal/tools/bash_budget_test.go Outdated

Comment thread internal/tools/bash.go

kevincodex1 added 3 commits July 5, 2026 22:14

kevincodex1 force-pushed the feat/perf-token-wave1 branch from b227599 to 4e74b65 Compare July 5, 2026 14:18

coderabbitai Bot reviewed Jul 5, 2026

View reviewed changes

coderabbitai Bot approved these changes Jul 5, 2026

View reviewed changes

Vasanthdev2004 requested changes Jul 5, 2026

View reviewed changes

coderabbitai Bot reviewed Jul 5, 2026

View reviewed changes

kevincodex1 requested a review from Vasanthdev2004 July 5, 2026 15:18

Vasanthdev2004 approved these changes Jul 5, 2026

View reviewed changes

kevincodex1 merged commit 95ccd5b into main Jul 5, 2026
9 checks passed

github-actions Bot mentioned this pull request Jul 5, 2026

chore(main): release 0.2.0 #370

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: universal tool-output ceiling with spill + async post-edit diagnostics#518

perf: universal tool-output ceiling with spill + async post-edit diagnostics#518
kevincodex1 merged 5 commits into
mainfrom
feat/perf-token-wave1

kevincodex1 commented Jul 5, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

github-actions Bot commented Jul 5, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jul 5, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Vasanthdev2004 commented Jul 5, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Vasanthdev2004 left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Vasanthdev2004 commented Jul 5, 2026

Uh oh!

Vasanthdev2004 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

kevincodex1 commented Jul 5, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Universal token-based output ceiling (1825ea4)

Async post-edit LSP diagnostics (b227599)

Checklist

Summary by CodeRabbit

Uh oh!

github-actions Bot commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Zero automated PR review

Blockers

Validation

Scope

Uh oh!

coderabbitai Bot commented Jul 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Vasanthdev2004 commented Jul 5, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Vasanthdev2004 left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Vasanthdev2004 commented Jul 5, 2026

Uh oh!

Vasanthdev2004 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kevincodex1 commented Jul 5, 2026 •

edited by coderabbitai Bot

Loading

Universal token-based output ceiling (`1825ea4`)

Async post-edit LSP diagnostics (`b227599`)

github-actions Bot commented Jul 5, 2026 •

edited

Loading

coderabbitai Bot commented Jul 5, 2026 •

edited

Loading