fix(tui): count unreconciled subagents against the cap by cyq1017 · Pull Request #2505 · Hmbown/CodeWhale

cyq1017 · 2026-06-01T13:36:26Z

Problem

Sub-agent task handles can finish before their status is reconciled out of Running. That lets the parent refill the cap early during fanout bursts, which is one of the remaining bug(tui): sub-agent fanout plus hidden worktrees can saturate the TUI during release work #2211 pressure points.

Change

Keep Running sub-agents with task handles counted until status reconciliation catches up.
Continue ignoring malformed/running records that never received a task handle.
Add focused coverage for the reconciled, finished-handle, and missing-handle cases.

Verification

cargo test -p codewhale-tui running_count --all-features --locked -- --nocapture
cargo check -p codewhale-tui --all-features --locked
cargo fmt --all -- --check
git diff --check origin/main..HEAD

Refs #2211

Greptile Summary

This PR tightens the sub-agent concurrency cap by removing the !handle.is_finished() exemption from running_count(). Previously, once a task's JoinHandle indicated completion, the agent was immediately excluded from the cap even if its Running status had not yet been reconciled to a terminal state — allowing fanout bursts to overfill the pool (#2211). Now an agent with Running status counts against the cap until update_from_result (or update_failed) atomically sets a terminal status and clears task_handle.

mod.rs: running_count() now counts any Running agent with a non-None task_handle, regardless of whether the task has already finished; the slot is released only when status reconciliation runs.
tests.rs: The renamed test test_running_count_counts_running_agents_until_status_reconciles flips its assertion from 0 to 1 for a finished-but-unreconciled handle, and two new companion tests cover the reconciled and no-handle cases.

Confidence Score: 4/5

Safe to merge; the race-condition fix is correct for all normal execution paths and the one edge case (panic inside the subagent task) is low probability given the codebase's error-handling style.

The logic change closes the fanout race window correctly: cap slots are now held until update_from_result runs, which atomically updates both the status and clears task_handle. The only gap is that spawn_supervised catches panics without calling update_from_result, so a panic inside run_subagent_task before that final write would leave an agent permanently occupying a slot with no automatic recovery path within the session. No unwrap/expect/panic! calls were found in the file, making this scenario unlikely in practice, but the supervision wrapper creates a silent path where the invariant can break.

The spawn_supervised call site in mod.rs — specifically whether the panic handler should also call update_failed — warrants a second look alongside this change.

Important Files Changed

Filename	Overview
crates/tui/src/tools/subagent/mod.rs	Core logic change: removes `!handle.is_finished()` from `running_count()` so that agents whose task has ended but whose status hasn't been reconciled yet still occupy a cap slot. The normal happy-path is correct; a panic edge case (task panics before `update_from_result` runs) leaves the slot permanently occupied.
crates/tui/src/tools/subagent/tests.rs	Test renamed and assertion inverted to match new behavior; existing no-handle and live-handle tests are unmodified and still pass. Minor: `test_running_count_counts_only_agents_with_live_task_handles` name now understates the new semantics (finished handles also count), but the assertion itself is still correct.

Sequence Diagram

sequenceDiagram
    participant P as Parent Agent
    participant M as SubAgentManager
    participant T as run_subagent_task (tokio)

    P->>M: "spawn_background() — running_count() < max"
    M->>M: "agent.status = Running, task_handle = Some(handle)"
    M-->>P: Ok(snapshot)

    note over T: Task executes…

    T->>T: "task finishes (handle.is_finished() = true)"

    note over M,T: Race window — OLD code excluded finished handles here,<br/>freeing the cap slot before reconciliation

    T->>M: write lock → update_from_result()
    M->>M: "agent.status = Completed/Cancelled/Failed"
    M->>M: "agent.task_handle = None"

    note over M: running_count() now excludes agent (status ≠ Running)

    P->>M: "spawn_background() — running_count() < max ✓"

Comments Outside Diff (2)

crates/tui/src/tools/subagent/mod.rs, line 1392-1396 (link)

Panic before update_from_result permanently occupies a cap slot

spawn_supervised wraps the future in catch_unwind, so if run_subagent_task panics before reaching the update_from_result / update_failed call at the end, the JoinHandle resolves to () but neither the status nor task_handle are updated. With the old !handle.is_finished() guard, a finished handle (including a panicked one) was excluded from running_count(), freeing the slot automatically. Under the new logic, the agent stays in Running with a Some(finished handle) and is counted indefinitely — blocking that cap slot until the user manually cancels the agent or the application restarts. The cleanup() path never evicts Running agents, so there is no automatic recovery within a session. This scenario requires a panic inside run_subagent_task (none found today), but spawn_supervised is precisely the place where that path exists silently.
crates/tui/src/tools/subagent/tests.rs, line 888 (link)

The test name test_running_count_counts_only_agents_with_live_task_handles now slightly misrepresents the semantics: after this PR, finished-but-unreconciled handles are also counted. Renaming it avoids confusion for the next reader.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (1): Last reviewed commit: "fix(tui): hold subagent cap until status..." | Re-trigger Greptile}

gemini-code-assist

Code Review

This pull request modifies the subagent manager to keep counting recently finished task handles until their terminal status has reconciled, preventing premature cap refills during fanout bursts. Feedback on these changes suggests simplifying the task handle check in mod.rs to directly return agent.task_handle.is_some(), and replacing the busy-yield loop in the test file with a more robust synchronization mechanism, such as a oneshot channel, to avoid potential flakiness under heavy CI load.

gemini-code-assist · 2026-06-01T13:37:21Z

+                if agent.task_handle.is_none() {
                    return false;
-                };
-                // Exclude agents whose task has finished (status will be updated to Completed shortly)
-                !handle.is_finished()
+                }
+                // Keep recently finished handles counted until the terminal
+                // status update has reconciled. Otherwise a fanout burst can
+                // refill the cap before the UI/state catches up (#2211).
+                true


The check for agent.task_handle.is_none() followed by returning false and then returning true can be simplified to directly returning agent.task_handle.is_some(). This is more idiomatic and concise.

// Keep recently finished handles counted until the terminal // status update has reconciled. Otherwise a fanout burst can // refill the cap before the UI/state catches up (#2211). agent.task_handle.is_some()

gemini-code-assist · 2026-06-01T13:37:21Z

+    let finished_handle = tokio::spawn(async {});
+    while !finished_handle.is_finished() {
+        tokio::task::yield_now().await;
    }
+    agent.task_handle = Some(finished_handle);


Using a busy-yield loop (while !finished_handle.is_finished() { tokio::task::yield_now().await; }) to wait for a spawned task to finish can be CPU-intensive and potentially flaky under heavy CI load. A more robust approach is to synchronize using a oneshot channel to signal completion before checking is_finished().

let (tx, rx) = tokio::sync::oneshot::channel(); let finished_handle = tokio::spawn(async move { let _ = tx.send(()); }); let _ = rx.await; while !finished_handle.is_finished() { tokio::task::yield_now().await; } agent.task_handle = Some(finished_handle);

Hmbown · 2026-06-01T14:09:09Z

Thanks @cyq1017. I harvested the narrow cap-reconciliation fix into the v0.8.50 triage PR as #2504 commit bc34cd13e (fix(tui): hold subagent cap until status reconciles).

Local verification on the harvest branch:

cargo test -p codewhale-tui --all-features --locked test_running_count_counts_running_agents_until_status_reconciles
cargo test -p codewhale-tui --all-features --locked subagent
cargo fmt --all -- --check

I am treating this as a partial #2211 fix only: it covers the finished-handle/status-reconciliation cap race, but the broader queue/status-panel/worktree-pressure criteria remain open.

fix(tui): hold subagent cap until status reconciles

5f01dda

gemini-code-assist Bot reviewed Jun 1, 2026

View reviewed changes

Hmbown mentioned this pull request Jun 1, 2026

[codex] v0.8.50 triage harvest #2504

Merged

Hmbown mentioned this pull request Jun 1, 2026

bug(tui): sub-agent fanout plus hidden worktrees can saturate the TUI during release work #2211

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(tui): count unreconciled subagents against the cap#2505

fix(tui): count unreconciled subagents against the cap#2505
cyq1017 wants to merge 1 commit into
Hmbown:mainfrom
cyq1017:codex/2211-subagent-cap

cyq1017 commented Jun 1, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Uh oh!

Hmbown commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cyq1017 commented Jun 1, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Comments Outside Diff (2)

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

Hmbown commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

cyq1017 commented Jun 1, 2026 •

edited by greptile-apps Bot

Loading