Skip to content

fix(ai): add parallel tool close guards#11965

Open
delta575 wants to merge 5 commits intovercel:mainfrom
delta575:fix/parallel-tool-close-guard
Open

fix(ai): add parallel tool close guards#11965
delta575 wants to merge 5 commits intovercel:mainfrom
delta575:fix/parallel-tool-close-guard

Conversation

@delta575
Copy link

@delta575 delta575 commented Jan 22, 2026

Background

When multiple tools execute in parallel and complete nearly simultaneously, their .finally() blocks all call attemptClose(). Without a guard, the first completion closes the stream and subsequent completions throw "Controller is already closed" or "stream is not in a state that permits enqueue" errors.

Summary

Add a closed flag to prevent race conditions in attemptClose():

  • Guard attemptClose() against re-entry when multiple tools complete simultaneously
  • Guard async enqueue() calls to prevent enqueueing after stream closure
  • Set flag before doing work to prevent TOCTOU race

Manual Verification

Tested with @convex-dev/agent and Gemini 2.5 Flash using 5+ parallel tool calls. Without this fix, intermittent "stream is not in a state that permits enqueue" errors occurred. With this fix, streaming completes successfully.

Checklist

  • Tests have been added / updated (for bug fixes / features)
  • Documentation has been added / updated (for bug fixes / features)
  • A patch changeset for relevant packages has been added (for bug fixes / features - run pnpm changeset in the project root)
  • I have reviewed this pull request (self-review)

Related Issues

Split from #11907 per review feedback to separate the close guard fix from the ID generation change.

delta575 and others added 2 commits January 22, 2026 14:23
…xecution

When multiple tools complete simultaneously, their finally() blocks all call
attemptClose(). Without a guard, the first completion closes the stream and
subsequent completions throw "Controller is already closed" errors.

Changes:
- Add `closed` flag to track stream state
- Guard attemptClose() against re-entry when already closed
- Guard all async enqueue calls (onPreliminaryToolResult, then, catch) to
  prevent enqueueing after stream is closed
- Add tests for parallel tool completion scenarios

This prevents "stream is not in a state that permits enqueue" errors when
tools complete in rapid succession or when the stream is closed externally
(e.g., via AbortSignal).

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds a third complementary test pattern for the stream close guard:
- Uses deferred promises for precise control over tool completion timing
- Verifies stream doesn't close prematurely while results are pending
- Uses Promise.race with timeout to verify blocking behavior
- Follows existing codebase patterns (no mocking, integration-style verification)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@delta575 delta575 changed the title Fix/parallel tool close guard fix(ai): add parallel tool close guards Jan 22, 2026
@lgrammel lgrammel self-assigned this Jan 23, 2026
@lgrammel
Copy link
Collaborator

lgrammel commented Jan 23, 2026

The test does not fail without the fix, thus it does not reproduce the reported issue. I have reverted the fix to the production code and the test still passes.

It also uses real-time clock changes (delay) which must be avoided in unit tests, and could be simplified with DelayedPromise.

Can you simplify the test, avoid any delay/wait operations, and make sure it fails without the fix?

- Replace delay operations with DelayedPromise for precise control
- Add test that reproduces the issue: constant generateId causes premature
  stream closure, and without the guard, subsequent enqueues throw
  "Controller is already closed"
- Remove real-time delays in favor of promise-based coordination

The key test simulates framework behavior where generateId returns a constant
for message grouping. This causes:
1. All tools to share one tracking ID in the Set
2. First tool completion empties the Set and closes the stream
3. Other tools' .then() callbacks try to enqueue → ERROR without guard

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@delta575 delta575 force-pushed the fix/parallel-tool-close-guard branch from b100924 to a4fe7aa Compare January 23, 2026 13:29
@delta575
Copy link
Author

The test does not fail without the fix, thus it does not reproduce the reported issue. I have reverted the fix to the production code and the test still passes.

It also uses real-time clock changes (delay) which must be avoided in unit tests, and could be simplified with DelayedPromise.

Can you simplify the test, avoid any delay/wait operations, and make sure it fails without the fix?

Simplified the tests with DelayedPromise and removed all delay() calls.

On the determinism issue: true race conditions are hard to reproduce in unit tests because JS is single-threaded - microtasks run sequentially even when promises resolve "simultaneously". The .finally() callbacks always execute one at a time, so the guard never gets exercised under normal conditions.

The new test triggers premature closure by using a constant generateId (which is what @convex-dev/agent does for message grouping - intentional on their side for thread message IDs, not a bug):

  1. All tools add same ID to Set → only 1 entry
  2. First tool completes → deletes ID → Set empty → stream closes
  3. Other tools try to enqueue → throws without guard

Removing the if (!closed) guards produces:
TypeError: Invalid state: Controller is already closed

Vitest reports this as unhandled rejection and exits with code 1.

The generateId discussion is addressed in #11966 - it's being used for two purposes (message IDs vs tool execution tracking), and frameworks legitimately override it for message grouping.

I've been using Claude to help debug and write these fixes.

Comment on lines +1188 to +1190
// Simulate framework behavior: generateId returns same ID for message grouping
// This causes outstandingToolResults Set to only track one entry
generateId: () => 'constant-id',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not the framework behavior. by default different ids are generated. if generateId returns the same id several times this is a misconfiguration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once this is changed to generateId: mockId() the test passes without the fix.

is this a misconfiguration and not an ai sdk bug?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the expectation is that generateId() generates unique ids. this might need to be clarified in the documentation / jsdoc

@lgrammel lgrammel removed their assignment Feb 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants