Skip to content

perf: universal tool-output ceiling with spill + async post-edit diagnostics#518

Merged
kevincodex1 merged 5 commits into
mainfrom
feat/perf-token-wave1
Jul 5, 2026
Merged

perf: universal tool-output ceiling with spill + async post-edit diagnostics#518
kevincodex1 merged 5 commits into
mainfrom
feat/perf-token-wave1

Conversation

@kevincodex1

@kevincodex1 kevincodex1 commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

Summary

Two structural fixes from the perf/token audit vs opencode — the biggest
recurring token leak and the biggest recurring latency tax in the loop.

Universal token-based output ceiling (1825ea4)

Every tool result is now capped at the registry boundary — 16k estimated
tokens by default, ZERO_TOOL_OUTPUT_CEILING_TOKENS-tunable, 0 disables —
unless the tool manages its own deliberate budget. Truncation keeps head+tail
and spills the full (already redaction-scrubbed) output to the existing spill
dir, re-readable via grep/read_file, so nothing is lost — only deferred.

This closes the three paths a single call could flood the context window
through (and then re-bill on every following turn):

  • web_fetch: up to 256KiB default / 2MiB max of non-HTML body, previously unbudgeted
  • skill: full SKILL.md returned uncapped
  • MCP-served tools: no budget at all

Self-budgeting tools opt out via an unexported marker interface (an MCP
server can never exempt itself): bash, exec_command (model-raisable),
read_file, read_minified_file, grep, glob, list_directory. Browser
snapshots deliberately stay under the net.

bash itself is tightened to match the unified token model: emit budget
96KiB → 32KiB per stream (~24k → ~8k tokens worst case per stream), while
capture retention stays at 96KiB head+tail — so the new sectioned bash spill
covers 6× more of the output than the transcript shows, and the spill path
also lands in meta["spill_path"].

Async post-edit LSP diagnostics (b227599)

Inline diagnostics blocked every edit_file/write_file result on the
language server: a ≥300ms quiet-period debounce that reset on each publish,
seconds on the first edit per language, capped at 10s — paid on every edit
of every session.

Collection is now asynchronous: a file is enqueued the moment a mutating
tool finishes, a single worker checks it while the rest of the batch (and
any self-correct verification) runs, and the loop drains completed results
just before building the NEXT provider request, appending errors as a
user-role nudge. The model always issues another request after an edit turn
— it has to read its tool results — so it still sees introduced errors at
the same decision point as before, with no tool call ever stalling on the
language server.

Consistency: a file re-edited before its check runs is checked once in its
final state (the checker re-reads the file); re-edited after, it re-queues
and the newer result wins. A drain that finds the worker still busy gives up
after 3s and delivers on a later turn — results stay accurate because any
further edit re-enqueues the file. The tools-side inline mechanism
(tools.RunOptions.Diagnostics) is kept for direct API callers; the loop
just no longer wires it.

Checklist

  • The linked issue already has the issue-approved label.
    No linked issue — core-team change, same waiver as feat: agent quality, caching, retry, and tooling upgrades #506.
  • go build ./..., go vet ./..., and go test ./... pass locally.
    Build and vet clean. Full suite passes except the 6 pre-existing
    internal/cli failures from the local opengateway provider profile
    ("openai provider opengateway requires official baseURL") —
    byte-identical failure set on main, unrelated to this change.
  • gofmt clean.
  • Tests added/updated for the change (and run under -race where relevant).
    Registry-level ceiling tests (cap + spill content, self-budgeting
    exemption, env override, small-output passthrough), bash spill tests,
    async-collector unit tests (deferred delivery on slow checks, re-edit
    latest-wins, nil no-op), and an end-to-end Run test proving the
    diagnostics nudge lands in the follow-up request and never leaks into
    the tool result. internal/tools and internal/agent (all new
    concurrency lives there) pass under -race.
  • UI changes include screenshots or a short recording where possible.
    N/A — no UI changes; TUI/CLI diffs are comment-only.

Summary by CodeRabbit

  • New Features
    • Added a universal per-call tool output ceiling (env-configurable); when exceeded, output is truncated and the full redacted transcript is saved with spill hints for later review.
    • File diagnostics now run in the background after edits and are delivered as a “nudge” message on the next request (including the final turn).
  • Bug Fixes
    • Repeated edits refresh diagnostics to avoid stale results.
    • Late diagnostics are no longer dropped during run finalization.
    • Spill files are further protected by an additional secrets redaction pass.

@github-actions

github-actions Bot commented Jul 5, 2026

Copy link
Copy Markdown
Contributor

Zero automated PR review

Verdict: No blockers found

Blockers

  • None found.

Validation

  • [pass] Diff hygiene: git diff --check
  • [pass] Tests: go test ./...
  • [pass] Build: go run ./cmd/zero-release build
  • [pass] Smoke build: go run ./cmd/zero-release smoke

Scope

Head: 07af8c451d9c
Changed files (13): internal/agent/async_diagnostics.go, internal/agent/async_diagnostics_test.go, internal/agent/loop.go, internal/agent/types.go, internal/cli/exec.go, internal/tools/bash.go, internal/tools/bash_budget_test.go, internal/tools/output_ceiling.go, internal/tools/output_ceiling_test.go, internal/tools/registry.go, internal/tools/spill.go, internal/tools/spill_test.go, and 1 more

This deterministic review checks validation status and basic diff hygiene. A human reviewer still owns product judgment and design quality.

@coderabbitai

coderabbitai Bot commented Jul 5, 2026

Copy link
Copy Markdown

Review Change Stack

Walkthrough

The agent now queues changed files for background diagnostics and injects the resulting nudge on later turns instead of running file diagnostics inline. Tool output handling also adds a universal ceiling, self-budgeting exemptions, bash spill files with preserved sections, and spill-file secret redaction.

Changes

Async post-edit diagnostics

Layer / File(s) Summary
Async diagnostics flow
internal/agent/async_diagnostics.go, internal/agent/loop.go, internal/agent/types.go, internal/cli/exec.go, internal/tui/model.go
Background diagnostics are queued from mutating tool results, drained at turn boundaries and before final answers, and no longer forwarded inline into tool execution; related comments are updated to describe the new flow.
Async diagnostics tests
internal/agent/async_diagnostics_test.go
Unit and end-to-end tests cover nil collectors, queue draining, slow checks, replacement of stale results, and delivery across normal, max-turn, and late finalization flows.

Tool output ceiling and spill handling

Layer / File(s) Summary
Bash capture spill handling
internal/tools/bash.go, internal/tools/bash_budget_test.go
Bash output budgeting now retains a larger capture buffer, spills truncated stdout/stderr to disk, and tests validate the spill file contents and capture-gap markers.
Spill redaction hardening
internal/tools/spill.go, internal/tools/spill_test.go
Spill files now receive an extra pattern-based secret redaction pass before being written, and tests confirm secrets are removed from the saved payload.
Universal output ceiling
internal/tools/output_ceiling.go, internal/tools/registry.go, internal/tools/output_ceiling_test.go
Registry result handling now applies a configurable output ceiling after secret scrubbing, skips self-budgeting tools, and tests cover truncation, exemption, env overrides, and spill-file behavior.

Estimated code review effort: 4 (Complex) | ~60 minutes

Possibly related PRs

  • Gitlawb/zero#146: Also changes internal/agent/loop.go tool execution control and request escalation logic.
  • Gitlawb/zero#175: Also updates internal/agent/loop.go changed-files batching and post-edit turn handling.
  • Gitlawb/zero#506: Touches the same diagnostics wiring and Options.FileDiagnostics flow in the agent loop.

Suggested reviewers: Vasanthdev2004, gnanam1990, jatmn

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the two main changes: a universal tool-output ceiling with spilling and async post-edit diagnostics.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/perf-token-wave1

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/tools/bash_budget_test.go (1)

119-146: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Add spill-path assertions to TestBudgetBashCaptureReportsTrueTotal. budgetBashCapture also writes meta["spill_path"] on truncation, so this test should exercise the spill file like TestBudgetBashOutputTruncatesHeadAndTail does.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/tools/bash_budget_test.go` around lines 119 - 146,
`TestBudgetBashCaptureReportsTrueTotal` currently verifies truncation metadata
but does not assert the spill-file path behavior that `budgetBashCapture` sets
when output is oversized. Update this test to also check `meta["spill_path"]`
and validate the spill contents the same way
`TestBudgetBashOutputTruncatesHeadAndTail` does, using `budgetBashCapture`,
`meta`, and the spill-path handling to confirm the full retained output is
written to disk on truncation.
🧹 Nitpick comments (1)
internal/tools/output_ceiling_test.go (1)

123-130: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Cover the read_minified_file exemption.

output_ceiling.go exempts readMinifiedFileTool at Line 41, but this table does not assert it. Add the read-minified tool here so removing that opt-out fails the exemption-list test.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/tools/output_ceiling_test.go` around lines 123 - 130, The
exemption-list test is missing the read-minified-file opt-out, so it won’t fail
if that special case is removed. Update the exempt tool table in the output
ceiling test to include the read-minified-file tool alongside the existing Tool
entries, using the same constructor/pattern as the other exemptions so the test
covers the readMinifiedFileTool exemption in output_ceiling.go.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/agent/loop.go`:
- Around line 168-177: The post-edit diagnostics are only drained at the start
of the next loop iteration in loop, so when maxTurns is reached the finalization
path in finalAnswerAfterMaxTurns can miss newly queued LSP errors. Move or
duplicate the postEditDiagnostics.drain(ctx) handling so diagnostics are
appended before the max-turn final answer request is made, ensuring the final
provider call sees the latest diagnostics.

In `@internal/tools/bash_budget_test.go`:
- Line 33: The spill temp-dir override in the budget test is Unix-only because
the code under test uses os.TempDir()/os.CreateTemp, so Windows may still spill
into the real temp directory. Update the test setup around the existing t.Setenv
call to override TMPDIR, TMP, and TEMP together, or switch the spill path setup
in internal/tools/spill.go to accept an injected root so the test can control it
cross-platform.

In `@internal/tools/bash.go`:
- Around line 407-419: The spill file written by spillBashStreams currently
concatenates the retained head and tail from boundedBuffer.retained() without
indicating that the middle was dropped. Update spillBashStreams so the saved
bash spill content includes a clear omission marker between the retained
sections, and make sure the same marker is used wherever the spill content is
assembled for outText/errText so readers can tell the capture was truncated.

---

Outside diff comments:
In `@internal/tools/bash_budget_test.go`:
- Around line 119-146: `TestBudgetBashCaptureReportsTrueTotal` currently
verifies truncation metadata but does not assert the spill-file path behavior
that `budgetBashCapture` sets when output is oversized. Update this test to also
check `meta["spill_path"]` and validate the spill contents the same way
`TestBudgetBashOutputTruncatesHeadAndTail` does, using `budgetBashCapture`,
`meta`, and the spill-path handling to confirm the full retained output is
written to disk on truncation.

---

Nitpick comments:
In `@internal/tools/output_ceiling_test.go`:
- Around line 123-130: The exemption-list test is missing the read-minified-file
opt-out, so it won’t fail if that special case is removed. Update the exempt
tool table in the output ceiling test to include the read-minified-file tool
alongside the existing Tool entries, using the same constructor/pattern as the
other exemptions so the test covers the readMinifiedFileTool exemption in
output_ceiling.go.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: afc0b4df-726c-4dba-9b5b-5d1b2993cb3d

📥 Commits

Reviewing files that changed from the base of the PR and between 30bff28 and b227599.

📒 Files selected for processing (11)
  • internal/agent/async_diagnostics.go
  • internal/agent/async_diagnostics_test.go
  • internal/agent/loop.go
  • internal/agent/types.go
  • internal/cli/exec.go
  • internal/tools/bash.go
  • internal/tools/bash_budget_test.go
  • internal/tools/output_ceiling.go
  • internal/tools/output_ceiling_test.go
  • internal/tools/registry.go
  • internal/tui/model.go

Comment thread internal/agent/loop.go
Comment thread internal/tools/bash_budget_test.go Outdated
Comment thread internal/tools/bash.go
@Vasanthdev2004

Copy link
Copy Markdown
Collaborator

@kevincodex1 can u able to fix coderabbit changes ?

Cap every tool result at the registry boundary (16k estimated tokens,
ZERO_TOOL_OUTPUT_CEILING_TOKENS-tunable, 0 disables) unless the tool
manages its own deliberate budget. Head+tail truncation, full output
spilled to the existing redacted spill dir and re-readable via
grep/read_file, so nothing is lost — only deferred.

This closes the three unbounded paths a single call could flood the
context window through (and then re-bill every following turn):
- web_fetch: up to 256KiB default / 2MiB max of non-HTML body
- skill: full SKILL.md returned uncapped
- MCP-served tools: no budget at all

Self-budgeting tools opt out via an unexported marker interface (an MCP
server can never exempt itself): bash, exec_command (model-raisable),
read_file, read_minified_file, grep, glob, list_directory. Browser
snapshots deliberately stay under the net.

bash itself is tightened to match the unified token model: emit budget
96KiB -> 32KiB per stream (~24k -> ~8k tokens worst case per stream),
while capture retention stays at 96KiB head+tail so the new spill file
covers 6x more of the output than the transcript shows.
Inline post-edit diagnostics blocked every edit_file/write_file result on
the language server: a >=300ms quiet-period debounce that reset on each
publish, seconds on the first edit per language, capped at 10s — the
largest recurring latency tax in an edit-heavy session.

Diagnostics are now collected in the background: a file is enqueued the
moment a mutating tool finishes, a single worker checks it while the rest
of the batch executes, and the loop drains completed results just before
building the NEXT provider request, appending errors as a user nudge.
The model always issues another request after an edit turn (to read its
tool results), so it still sees introduced errors at the same decision
point — no tool call ever stalls on the language server.

A file re-edited before its check runs is checked once in its final
state (the checker re-reads the file); re-edited after, it is re-queued
and the newer result wins. A drain that finds the worker still busy
gives up after 3s and delivers on a later turn — results stay accurate
because any further edit re-enqueues the file.

The tools-side inline mechanism (tools.RunOptions.Diagnostics) is kept
for direct API callers; the loop just no longer wires it.
…, test hardening

- Drain background diagnostics before the max-turns final-answer request
  so an error introduced by the LAST turn's edit is reported in the
  summary instead of silently dropped (+ regression test).
- spillBashStreams marks the head/tail junction with a capture-gap
  marker when boundedBuffer dropped the middle of a stream, so a spilled
  log never reads as contiguous when it is not; junction position is
  asserted in the test.
- Tests override TMP/TEMP alongside TMPDIR (os.TempDir reads TMP/TEMP on
  Windows) via a shared setTestTempDir helper.
- TestBudgetBashCaptureReportsTrueTotal now asserts spill_path meta and
  spill file content; exemption-list test gains read_minified_file.
@kevincodex1 kevincodex1 force-pushed the feat/perf-token-wave1 branch from b227599 to 4e74b65 Compare July 5, 2026 14:18

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/agent/async_diagnostics_test.go (1)

75-96: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Loose timing assertion, but functionally sufficient.

The elapsed > time.Second check only catches gross blocking regressions, not the intended asyncDiagnosticsDrainTimeout-bound behavior. A tighter bound (e.g., a few multiples of the 20ms timeout) would catch subtler regressions, but not essential given the deferred-delivery loop below also validates correctness.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/agent/async_diagnostics_test.go` around lines 75 - 96, The timing
assertion in async diagnostics draining is too loose and should be tightened to
verify the intended asyncDiagnosticsDrainTimeout behavior. In
async_diagnostics_test.go, update the drain duration check in the async
diagnostics test that uses diagnostics.drain and release so it fails on a
bounded multiple of the shortened timeout rather than waiting up to a full
second, while keeping the deferred-delivery loop that validates the eventual
nudge from drain.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/agent/async_diagnostics_test.go`:
- Around line 75-96: The timing assertion in async diagnostics draining is too
loose and should be tightened to verify the intended
asyncDiagnosticsDrainTimeout behavior. In async_diagnostics_test.go, update the
drain duration check in the async diagnostics test that uses diagnostics.drain
and release so it fails on a bounded multiple of the shortened timeout rather
than waiting up to a full second, while keeping the deferred-delivery loop that
validates the eventual nudge from drain.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 7df9fbb7-7b4d-4c1d-98df-7d3073686992

📥 Commits

Reviewing files that changed from the base of the PR and between b227599 and 4e74b65.

📒 Files selected for processing (11)
  • internal/agent/async_diagnostics.go
  • internal/agent/async_diagnostics_test.go
  • internal/agent/loop.go
  • internal/agent/types.go
  • internal/cli/exec.go
  • internal/tools/bash.go
  • internal/tools/bash_budget_test.go
  • internal/tools/output_ceiling.go
  • internal/tools/output_ceiling_test.go
  • internal/tools/registry.go
  • internal/tui/model.go
✅ Files skipped from review due to trivial changes (3)
  • internal/tui/model.go
  • internal/agent/types.go
  • internal/cli/exec.go
🚧 Files skipped from review as they are similar to previous changes (6)
  • internal/tools/registry.go
  • internal/tools/output_ceiling.go
  • internal/agent/async_diagnostics.go
  • internal/tools/output_ceiling_test.go
  • internal/tools/bash.go
  • internal/agent/loop.go

filepath.IsAbs("/ws") is false on Windows (no drive letter), so the
collector test's workspace-absolute assertion failed on the windows
smoke job. Root the fake workspace at t.TempDir() — absolute on every
platform — and assert against it.

@Vasanthdev2004 Vasanthdev2004 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dug into this hard — it's a strong perf change and the output-ceiling half is genuinely clean, but three things to fix before it lands, one of them a real secret leak.

[blocker — security] The bash spill bypasses the high-confidence secret scrub. In budgetBashCapture, spillBashStreams(out, ..., errStr, ...) (bash.go:408) gets the RAW retained streams and spills them before formatBashOutput runs secrets.Redact (bash.go:353-354) on the transcript. spillTruncatedOutput only re-applies RedactString — the configured-key scrub — not secrets.Redact's pattern matcher (AWS keys, GitHub PATs, Slack/OpenAI keys, JWTs). So a credential that bash redacts from the transcript sits in cleartext in the spill, and the notice explicitly tells the model to grep/read_file it. That's exactly the leak secrets.Redact exists to prevent, reopened through the new spill path. Fix: run secrets.Redact on the streams before spilling (in spillBashStreams, or have spillTruncatedOutput apply it).

[major — correctness] The final edit's diagnostics can be silently dropped — a regression from the inline path. The drain only runs at the top of the loop (loop.go:172, asyncDiagnosticsDrainTimeout = 3s) and in the max-turns path (705). The natural no-tool-call completion at loop.go:494 returns the final answer with no drain. So on a cold/large-repo LSP with a single-edit task: the model edits, the check is enqueued and allowed up to 10s (fileDiagnosticsTimeout), the next turn's top-of-loop drain gives up after 3s, and if the model completes that turn the introduced error is never surfaced. The inline path always blocked until diagnostics were ready, so it always showed them before the final decision — this is the exact scenario the feature targets. Simplest fix: drain at the completion boundary before returning the final answer, and if it turns up errors, inject the nudge and continue one more turn instead of finishing.

[blocker — CI] Windows smoke is red. I ran it here (I'm on Windows): TestAsyncDiagnosticsCollectsAndDrainsOnce fails because it uses "/ws" as the workspace root, which isn't Windows-absolute — filepath.IsAbs("\ws\broken.go") is false on Windows, so the assertion trips. It's a test-portability issue, not a production bug (absPath is correct for a real absolute root), but it's red CI. Use t.TempDir() for the root instead of "/ws".

The rest holds up — I verified the ceiling directly: MCP/external tools genuinely can't exempt themselves (the marker is an unexported method and they're in a different package, so Go won't let them satisfy it), redaction runs before the ceiling spill, truncation is rune-safe and keeps the failure tail, only Output is capped so structured Meta survives, and the async concurrency itself is race/leak/deadlock-free with correct re-edit-latest-wins. That part's solid.

Fix the three and I'll take another pass.

…ostics

Review follow-ups (Vasanthdev2004):

- Spill files now pass the pattern-based secret scanner (AWS keys,
  GitHub/Slack/OpenAI tokens, PEM blocks, JWTs) in addition to the
  configured-key scrub. A bash spill runs before formatBashOutput's
  scan, so without this a spilled file held in cleartext exactly the
  credentials the transcript redacts. Applied centrally in
  spillTruncatedOutput, covering bash, exec_command, and ceiling spills.

- Finalization diagnostics gate: a no-tool-call final answer now drains
  pending post-edit diagnostics with the full inline-era budget (10s,
  vs the 3s per-turn wait) before the run is accepted; introduced
  errors append the nudge and give the model one more turn instead of
  being silently dropped when no later turn exists. The max-turns
  summary path uses the same finalization budget. Free for runs that
  never edited — an idle collector returns immediately.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
internal/agent/loop.go (1)

623-631: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Avoid duplicate LSP nudges when SelfCorrect is enabled

SelfCorrect.AfterEdit and FileDiagnostics both surface error-severity diagnostics for the same changed files, so enabling both on the same run repeats the same LSP error to the model. Deduplicate one path or disable FileDiagnostics while SelfCorrect is active.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/agent/loop.go` around lines 623 - 631, The current edit path in
loop.go enqueues FileDiagnostics for every successful mutating tool even when
SelfCorrect is also active, causing duplicate error-severity LSP nudges for the
same changed files. Update the post-edit flow around toolResult.Status OK and
changedFilesThisBatch so that only one source of diagnostics is used when
SelfCorrect.AfterEdit is enabled—either skip postEditDiagnostics.enqueue for
those files or gate FileDiagnostics off in that mode. Make the change in the
logic that handles changedFilesThisBatch and postEditDiagnostics so the same
file set is not reported twice.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@internal/agent/loop.go`:
- Around line 623-631: The current edit path in loop.go enqueues FileDiagnostics
for every successful mutating tool even when SelfCorrect is also active, causing
duplicate error-severity LSP nudges for the same changed files. Update the
post-edit flow around toolResult.Status OK and changedFilesThisBatch so that
only one source of diagnostics is used when SelfCorrect.AfterEdit is
enabled—either skip postEditDiagnostics.enqueue for those files or gate
FileDiagnostics off in that mode. Make the change in the logic that handles
changedFilesThisBatch and postEditDiagnostics so the same file set is not
reported twice.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 1ef5bfc0-b123-4e4e-811a-5c03c98378ca

📥 Commits

Reviewing files that changed from the base of the PR and between fc89f2a and 07af8c4.

📒 Files selected for processing (5)
  • internal/agent/async_diagnostics.go
  • internal/agent/async_diagnostics_test.go
  • internal/agent/loop.go
  • internal/tools/spill.go
  • internal/tools/spill_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/agent/async_diagnostics.go

@Vasanthdev2004

Copy link
Copy Markdown
Collaborator

@kevincodex1 doing

@Vasanthdev2004 Vasanthdev2004 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed the fixes — all three are addressed, and I checked each against the code rather than the commit messages. Approving.

The bash spill leak is closed. spillTruncatedOutput now runs both RedactString and secrets.Redact before writing (spill.go:115-116), and since every spill — bash and the universal ceiling — routes through it, the high-confidence patterns (AWS keys, GitHub PATs, JWTs) are scrubbed from the file the model reads back, not just from the transcript. That was the important one.

The finalization gate is right. Before returning the final answer, the completion path calls drainFinal (loop.go:500), which waits the full inline-era budget once rather than the 3s per-turn timeout; if diagnostics surface an error it appends the nudge and continues one more turn instead of finalizing past it, and only an empty/idle result finalizes. The max-turns path gets the same treatment (718). So the final edit's diagnostics can't be silently dropped anymore — the model still sees an introduced error before it's done, which was the whole point.

Windows is green. The async test uses a real workspace root now; I re-ran the agent + tools suites here on Windows and they pass, and CI is green across all nine checks.

Everything I flagged is handled, and the parts I'd already verified clean — the ceiling boundary, MCP tools genuinely not being able to self-exempt, and the async concurrency itself (race/leak/deadlock-free, latest-wins) — still hold. Good turnaround; this is a solid perf change now. Good to merge.

@kevincodex1 kevincodex1 merged commit 95ccd5b into main Jul 5, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants