perf: universal tool-output ceiling with spill + async post-edit diagnostics#518
Conversation
Zero automated PR reviewVerdict: No blockers found Blockers
Validation
ScopeHead: This deterministic review checks validation status and basic diff hygiene. A human reviewer still owns product judgment and design quality. |
WalkthroughThe agent now queues changed files for background diagnostics and injects the resulting nudge on later turns instead of running file diagnostics inline. Tool output handling also adds a universal ceiling, self-budgeting exemptions, bash spill files with preserved sections, and spill-file secret redaction. ChangesAsync post-edit diagnostics
Tool output ceiling and spill handling
Estimated code review effort: 4 (Complex) | ~60 minutes Possibly related PRs
Suggested reviewers: 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/tools/bash_budget_test.go (1)
119-146: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winAdd spill-path assertions to
TestBudgetBashCaptureReportsTrueTotal.budgetBashCapturealso writesmeta["spill_path"]on truncation, so this test should exercise the spill file likeTestBudgetBashOutputTruncatesHeadAndTaildoes.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/tools/bash_budget_test.go` around lines 119 - 146, `TestBudgetBashCaptureReportsTrueTotal` currently verifies truncation metadata but does not assert the spill-file path behavior that `budgetBashCapture` sets when output is oversized. Update this test to also check `meta["spill_path"]` and validate the spill contents the same way `TestBudgetBashOutputTruncatesHeadAndTail` does, using `budgetBashCapture`, `meta`, and the spill-path handling to confirm the full retained output is written to disk on truncation.
🧹 Nitpick comments (1)
internal/tools/output_ceiling_test.go (1)
123-130: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winCover the
read_minified_fileexemption.
output_ceiling.goexemptsreadMinifiedFileToolat Line 41, but this table does not assert it. Add the read-minified tool here so removing that opt-out fails the exemption-list test.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/tools/output_ceiling_test.go` around lines 123 - 130, The exemption-list test is missing the read-minified-file opt-out, so it won’t fail if that special case is removed. Update the exempt tool table in the output ceiling test to include the read-minified-file tool alongside the existing Tool entries, using the same constructor/pattern as the other exemptions so the test covers the readMinifiedFileTool exemption in output_ceiling.go.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/agent/loop.go`:
- Around line 168-177: The post-edit diagnostics are only drained at the start
of the next loop iteration in loop, so when maxTurns is reached the finalization
path in finalAnswerAfterMaxTurns can miss newly queued LSP errors. Move or
duplicate the postEditDiagnostics.drain(ctx) handling so diagnostics are
appended before the max-turn final answer request is made, ensuring the final
provider call sees the latest diagnostics.
In `@internal/tools/bash_budget_test.go`:
- Line 33: The spill temp-dir override in the budget test is Unix-only because
the code under test uses os.TempDir()/os.CreateTemp, so Windows may still spill
into the real temp directory. Update the test setup around the existing t.Setenv
call to override TMPDIR, TMP, and TEMP together, or switch the spill path setup
in internal/tools/spill.go to accept an injected root so the test can control it
cross-platform.
In `@internal/tools/bash.go`:
- Around line 407-419: The spill file written by spillBashStreams currently
concatenates the retained head and tail from boundedBuffer.retained() without
indicating that the middle was dropped. Update spillBashStreams so the saved
bash spill content includes a clear omission marker between the retained
sections, and make sure the same marker is used wherever the spill content is
assembled for outText/errText so readers can tell the capture was truncated.
---
Outside diff comments:
In `@internal/tools/bash_budget_test.go`:
- Around line 119-146: `TestBudgetBashCaptureReportsTrueTotal` currently
verifies truncation metadata but does not assert the spill-file path behavior
that `budgetBashCapture` sets when output is oversized. Update this test to also
check `meta["spill_path"]` and validate the spill contents the same way
`TestBudgetBashOutputTruncatesHeadAndTail` does, using `budgetBashCapture`,
`meta`, and the spill-path handling to confirm the full retained output is
written to disk on truncation.
---
Nitpick comments:
In `@internal/tools/output_ceiling_test.go`:
- Around line 123-130: The exemption-list test is missing the read-minified-file
opt-out, so it won’t fail if that special case is removed. Update the exempt
tool table in the output ceiling test to include the read-minified-file tool
alongside the existing Tool entries, using the same constructor/pattern as the
other exemptions so the test covers the readMinifiedFileTool exemption in
output_ceiling.go.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: afc0b4df-726c-4dba-9b5b-5d1b2993cb3d
📒 Files selected for processing (11)
internal/agent/async_diagnostics.gointernal/agent/async_diagnostics_test.gointernal/agent/loop.gointernal/agent/types.gointernal/cli/exec.gointernal/tools/bash.gointernal/tools/bash_budget_test.gointernal/tools/output_ceiling.gointernal/tools/output_ceiling_test.gointernal/tools/registry.gointernal/tui/model.go
|
@kevincodex1 can u able to fix coderabbit changes ? |
Cap every tool result at the registry boundary (16k estimated tokens, ZERO_TOOL_OUTPUT_CEILING_TOKENS-tunable, 0 disables) unless the tool manages its own deliberate budget. Head+tail truncation, full output spilled to the existing redacted spill dir and re-readable via grep/read_file, so nothing is lost — only deferred. This closes the three unbounded paths a single call could flood the context window through (and then re-bill every following turn): - web_fetch: up to 256KiB default / 2MiB max of non-HTML body - skill: full SKILL.md returned uncapped - MCP-served tools: no budget at all Self-budgeting tools opt out via an unexported marker interface (an MCP server can never exempt itself): bash, exec_command (model-raisable), read_file, read_minified_file, grep, glob, list_directory. Browser snapshots deliberately stay under the net. bash itself is tightened to match the unified token model: emit budget 96KiB -> 32KiB per stream (~24k -> ~8k tokens worst case per stream), while capture retention stays at 96KiB head+tail so the new spill file covers 6x more of the output than the transcript shows.
Inline post-edit diagnostics blocked every edit_file/write_file result on the language server: a >=300ms quiet-period debounce that reset on each publish, seconds on the first edit per language, capped at 10s — the largest recurring latency tax in an edit-heavy session. Diagnostics are now collected in the background: a file is enqueued the moment a mutating tool finishes, a single worker checks it while the rest of the batch executes, and the loop drains completed results just before building the NEXT provider request, appending errors as a user nudge. The model always issues another request after an edit turn (to read its tool results), so it still sees introduced errors at the same decision point — no tool call ever stalls on the language server. A file re-edited before its check runs is checked once in its final state (the checker re-reads the file); re-edited after, it is re-queued and the newer result wins. A drain that finds the worker still busy gives up after 3s and delivers on a later turn — results stay accurate because any further edit re-enqueues the file. The tools-side inline mechanism (tools.RunOptions.Diagnostics) is kept for direct API callers; the loop just no longer wires it.
…, test hardening - Drain background diagnostics before the max-turns final-answer request so an error introduced by the LAST turn's edit is reported in the summary instead of silently dropped (+ regression test). - spillBashStreams marks the head/tail junction with a capture-gap marker when boundedBuffer dropped the middle of a stream, so a spilled log never reads as contiguous when it is not; junction position is asserted in the test. - Tests override TMP/TEMP alongside TMPDIR (os.TempDir reads TMP/TEMP on Windows) via a shared setTestTempDir helper. - TestBudgetBashCaptureReportsTrueTotal now asserts spill_path meta and spill file content; exemption-list test gains read_minified_file.
b227599 to
4e74b65
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
internal/agent/async_diagnostics_test.go (1)
75-96: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueLoose timing assertion, but functionally sufficient.
The
elapsed > time.Secondcheck only catches gross blocking regressions, not the intendedasyncDiagnosticsDrainTimeout-bound behavior. A tighter bound (e.g., a few multiples of the 20ms timeout) would catch subtler regressions, but not essential given the deferred-delivery loop below also validates correctness.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/agent/async_diagnostics_test.go` around lines 75 - 96, The timing assertion in async diagnostics draining is too loose and should be tightened to verify the intended asyncDiagnosticsDrainTimeout behavior. In async_diagnostics_test.go, update the drain duration check in the async diagnostics test that uses diagnostics.drain and release so it fails on a bounded multiple of the shortened timeout rather than waiting up to a full second, while keeping the deferred-delivery loop that validates the eventual nudge from drain.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@internal/agent/async_diagnostics_test.go`:
- Around line 75-96: The timing assertion in async diagnostics draining is too
loose and should be tightened to verify the intended
asyncDiagnosticsDrainTimeout behavior. In async_diagnostics_test.go, update the
drain duration check in the async diagnostics test that uses diagnostics.drain
and release so it fails on a bounded multiple of the shortened timeout rather
than waiting up to a full second, while keeping the deferred-delivery loop that
validates the eventual nudge from drain.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 7df9fbb7-7b4d-4c1d-98df-7d3073686992
📒 Files selected for processing (11)
internal/agent/async_diagnostics.gointernal/agent/async_diagnostics_test.gointernal/agent/loop.gointernal/agent/types.gointernal/cli/exec.gointernal/tools/bash.gointernal/tools/bash_budget_test.gointernal/tools/output_ceiling.gointernal/tools/output_ceiling_test.gointernal/tools/registry.gointernal/tui/model.go
✅ Files skipped from review due to trivial changes (3)
- internal/tui/model.go
- internal/agent/types.go
- internal/cli/exec.go
🚧 Files skipped from review as they are similar to previous changes (6)
- internal/tools/registry.go
- internal/tools/output_ceiling.go
- internal/agent/async_diagnostics.go
- internal/tools/output_ceiling_test.go
- internal/tools/bash.go
- internal/agent/loop.go
filepath.IsAbs("/ws") is false on Windows (no drive letter), so the
collector test's workspace-absolute assertion failed on the windows
smoke job. Root the fake workspace at t.TempDir() — absolute on every
platform — and assert against it.
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Dug into this hard — it's a strong perf change and the output-ceiling half is genuinely clean, but three things to fix before it lands, one of them a real secret leak.
[blocker — security] The bash spill bypasses the high-confidence secret scrub. In budgetBashCapture, spillBashStreams(out, ..., errStr, ...) (bash.go:408) gets the RAW retained streams and spills them before formatBashOutput runs secrets.Redact (bash.go:353-354) on the transcript. spillTruncatedOutput only re-applies RedactString — the configured-key scrub — not secrets.Redact's pattern matcher (AWS keys, GitHub PATs, Slack/OpenAI keys, JWTs). So a credential that bash redacts from the transcript sits in cleartext in the spill, and the notice explicitly tells the model to grep/read_file it. That's exactly the leak secrets.Redact exists to prevent, reopened through the new spill path. Fix: run secrets.Redact on the streams before spilling (in spillBashStreams, or have spillTruncatedOutput apply it).
[major — correctness] The final edit's diagnostics can be silently dropped — a regression from the inline path. The drain only runs at the top of the loop (loop.go:172, asyncDiagnosticsDrainTimeout = 3s) and in the max-turns path (705). The natural no-tool-call completion at loop.go:494 returns the final answer with no drain. So on a cold/large-repo LSP with a single-edit task: the model edits, the check is enqueued and allowed up to 10s (fileDiagnosticsTimeout), the next turn's top-of-loop drain gives up after 3s, and if the model completes that turn the introduced error is never surfaced. The inline path always blocked until diagnostics were ready, so it always showed them before the final decision — this is the exact scenario the feature targets. Simplest fix: drain at the completion boundary before returning the final answer, and if it turns up errors, inject the nudge and continue one more turn instead of finishing.
[blocker — CI] Windows smoke is red. I ran it here (I'm on Windows): TestAsyncDiagnosticsCollectsAndDrainsOnce fails because it uses "/ws" as the workspace root, which isn't Windows-absolute — filepath.IsAbs("\ws\broken.go") is false on Windows, so the assertion trips. It's a test-portability issue, not a production bug (absPath is correct for a real absolute root), but it's red CI. Use t.TempDir() for the root instead of "/ws".
The rest holds up — I verified the ceiling directly: MCP/external tools genuinely can't exempt themselves (the marker is an unexported method and they're in a different package, so Go won't let them satisfy it), redaction runs before the ceiling spill, truncation is rune-safe and keeps the failure tail, only Output is capped so structured Meta survives, and the async concurrency itself is race/leak/deadlock-free with correct re-edit-latest-wins. That part's solid.
Fix the three and I'll take another pass.
…ostics Review follow-ups (Vasanthdev2004): - Spill files now pass the pattern-based secret scanner (AWS keys, GitHub/Slack/OpenAI tokens, PEM blocks, JWTs) in addition to the configured-key scrub. A bash spill runs before formatBashOutput's scan, so without this a spilled file held in cleartext exactly the credentials the transcript redacts. Applied centrally in spillTruncatedOutput, covering bash, exec_command, and ceiling spills. - Finalization diagnostics gate: a no-tool-call final answer now drains pending post-edit diagnostics with the full inline-era budget (10s, vs the 3s per-turn wait) before the run is accepted; introduced errors append the nudge and give the model one more turn instead of being silently dropped when no later turn exists. The max-turns summary path uses the same finalization budget. Free for runs that never edited — an idle collector returns immediately.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/agent/loop.go (1)
623-631: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick winAvoid duplicate LSP nudges when SelfCorrect is enabled
SelfCorrect.AfterEditandFileDiagnosticsboth surface error-severity diagnostics for the same changed files, so enabling both on the same run repeats the same LSP error to the model. Deduplicate one path or disableFileDiagnosticswhileSelfCorrectis active.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@internal/agent/loop.go` around lines 623 - 631, The current edit path in loop.go enqueues FileDiagnostics for every successful mutating tool even when SelfCorrect is also active, causing duplicate error-severity LSP nudges for the same changed files. Update the post-edit flow around toolResult.Status OK and changedFilesThisBatch so that only one source of diagnostics is used when SelfCorrect.AfterEdit is enabled—either skip postEditDiagnostics.enqueue for those files or gate FileDiagnostics off in that mode. Make the change in the logic that handles changedFilesThisBatch and postEditDiagnostics so the same file set is not reported twice.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@internal/agent/loop.go`:
- Around line 623-631: The current edit path in loop.go enqueues FileDiagnostics
for every successful mutating tool even when SelfCorrect is also active, causing
duplicate error-severity LSP nudges for the same changed files. Update the
post-edit flow around toolResult.Status OK and changedFilesThisBatch so that
only one source of diagnostics is used when SelfCorrect.AfterEdit is
enabled—either skip postEditDiagnostics.enqueue for those files or gate
FileDiagnostics off in that mode. Make the change in the logic that handles
changedFilesThisBatch and postEditDiagnostics so the same file set is not
reported twice.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1ef5bfc0-b123-4e4e-811a-5c03c98378ca
📒 Files selected for processing (5)
internal/agent/async_diagnostics.gointernal/agent/async_diagnostics_test.gointernal/agent/loop.gointernal/tools/spill.gointernal/tools/spill_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/agent/async_diagnostics.go
|
@kevincodex1 doing |
Vasanthdev2004
left a comment
There was a problem hiding this comment.
Re-reviewed the fixes — all three are addressed, and I checked each against the code rather than the commit messages. Approving.
The bash spill leak is closed. spillTruncatedOutput now runs both RedactString and secrets.Redact before writing (spill.go:115-116), and since every spill — bash and the universal ceiling — routes through it, the high-confidence patterns (AWS keys, GitHub PATs, JWTs) are scrubbed from the file the model reads back, not just from the transcript. That was the important one.
The finalization gate is right. Before returning the final answer, the completion path calls drainFinal (loop.go:500), which waits the full inline-era budget once rather than the 3s per-turn timeout; if diagnostics surface an error it appends the nudge and continues one more turn instead of finalizing past it, and only an empty/idle result finalizes. The max-turns path gets the same treatment (718). So the final edit's diagnostics can't be silently dropped anymore — the model still sees an introduced error before it's done, which was the whole point.
Windows is green. The async test uses a real workspace root now; I re-ran the agent + tools suites here on Windows and they pass, and CI is green across all nine checks.
Everything I flagged is handled, and the parts I'd already verified clean — the ceiling boundary, MCP tools genuinely not being able to self-exempt, and the async concurrency itself (race/leak/deadlock-free, latest-wins) — still hold. Good turnaround; this is a solid perf change now. Good to merge.
Summary
Two structural fixes from the perf/token audit vs opencode — the biggest
recurring token leak and the biggest recurring latency tax in the loop.
Universal token-based output ceiling (1825ea4)
Every tool result is now capped at the registry boundary — 16k estimated
tokens by default,
ZERO_TOOL_OUTPUT_CEILING_TOKENS-tunable,0disables —unless the tool manages its own deliberate budget. Truncation keeps head+tail
and spills the full (already redaction-scrubbed) output to the existing spill
dir, re-readable via grep/read_file, so nothing is lost — only deferred.
This closes the three paths a single call could flood the context window
through (and then re-bill on every following turn):
web_fetch: up to 256KiB default / 2MiB max of non-HTML body, previously unbudgetedskill: full SKILL.md returned uncappedSelf-budgeting tools opt out via an unexported marker interface (an MCP
server can never exempt itself):
bash,exec_command(model-raisable),read_file,read_minified_file,grep,glob,list_directory. Browsersnapshots deliberately stay under the net.
bash itself is tightened to match the unified token model: emit budget
96KiB → 32KiB per stream (~24k → ~8k tokens worst case per stream), while
capture retention stays at 96KiB head+tail — so the new sectioned bash spill
covers 6× more of the output than the transcript shows, and the spill path
also lands in
meta["spill_path"].Async post-edit LSP diagnostics (b227599)
Inline diagnostics blocked every
edit_file/write_fileresult on thelanguage server: a ≥300ms quiet-period debounce that reset on each publish,
seconds on the first edit per language, capped at 10s — paid on every edit
of every session.
Collection is now asynchronous: a file is enqueued the moment a mutating
tool finishes, a single worker checks it while the rest of the batch (and
any self-correct verification) runs, and the loop drains completed results
just before building the NEXT provider request, appending errors as a
user-role nudge. The model always issues another request after an edit turn
— it has to read its tool results — so it still sees introduced errors at
the same decision point as before, with no tool call ever stalling on the
language server.
Consistency: a file re-edited before its check runs is checked once in its
final state (the checker re-reads the file); re-edited after, it re-queues
and the newer result wins. A drain that finds the worker still busy gives up
after 3s and delivers on a later turn — results stay accurate because any
further edit re-enqueues the file. The tools-side inline mechanism
(
tools.RunOptions.Diagnostics) is kept for direct API callers; the loopjust no longer wires it.
Checklist
issue-approvedlabel.No linked issue — core-team change, same waiver as feat: agent quality, caching, retry, and tooling upgrades #506.
go build ./...,go vet ./..., andgo test ./...pass locally.Build and vet clean. Full suite passes except the 6 pre-existing
internal/clifailures from the localopengatewayprovider profile("openai provider opengateway requires official baseURL") —
byte-identical failure set on
main, unrelated to this change.gofmtclean.-racewhere relevant).Registry-level ceiling tests (cap + spill content, self-budgeting
exemption, env override, small-output passthrough), bash spill tests,
async-collector unit tests (deferred delivery on slow checks, re-edit
latest-wins, nil no-op), and an end-to-end
Runtest proving thediagnostics nudge lands in the follow-up request and never leaks into
the tool result.
internal/toolsandinternal/agent(all newconcurrency lives there) pass under
-race.N/A — no UI changes; TUI/CLI diffs are comment-only.
Summary by CodeRabbit