Skip to content

Fix trio compatibility with owner task pattern for cancel scopes#364

Draft
mdworsky wants to merge 3 commits intomainfrom
michaeldworsky/trio-compatibility
Draft

Fix trio compatibility with owner task pattern for cancel scopes#364
mdworsky wants to merge 3 commits intomainfrom
michaeldworsky/trio-compatibility

Conversation

@mdworsky
Copy link
Copy Markdown
Collaborator

Summary

  • Fix cancel scope errors when using the SDK with trio by implementing the "owner task pattern"
  • Ensure inner task groups are properly managed by a single task, satisfying trio's strict cancel scope ownership requirements
  • Add 13 new tests covering the owner task pattern with both asyncio and trio backends

Problem

The SDK was manually calling __aenter__ and __aexit__ on anyio task groups, which violates trio's requirement that cancel scopes must be exited by the same task that entered them. This caused errors like:

RuntimeError: Attempted to exit cancel scope in a different task than it was entered in

Solution

Implement the owner task pattern: a dedicated task owns the inner task group for its entire lifetime using proper async with semantics. The outer code communicates with this owner task via events:

  • start() spawns the owner task and waits for _owner_started_event
  • close() sets _owner_stop_event, signaling the owner to cancel and exit cleanly

This ensures the inner task group (which does the actual message reading) is always entered and exited by the same task.

Test plan

  • All 130 tests pass (117 existing + 13 new)
  • New tests verify owner task lifecycle with asyncio backend
  • New tests verify owner task lifecycle with trio backend
  • Concurrent operations work correctly with both backends

🤖 Generated with Claude Code

🏠 Remote-Dev: homespace
@mdworsky mdworsky marked this pull request as draft November 24, 2025 16:22
drillan added a commit to drillan/claude-agent-sdk-python that referenced this pull request Feb 24, 2026
Apply the owner task pattern from anthropics#364 to ensure inner task
groups are properly managed by a single task.  This prevents
RuntimeError when cancel scopes are exited from a different task or
in wrong LIFO order.

- Query: dedicated _task_group_owner manages inner task group
- SubprocessCLITransport: dedicated _stderr_owner_task for stderr reader
- close() signals owner via anyio.Event instead of manual __aexit__

Fixes anthropics#454, related to anthropics#378.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
drillan added a commit to drillan/claude-agent-sdk-python that referenced this pull request Feb 24, 2026
Apply the owner task pattern from anthropics#364 to ensure inner task
groups are properly managed by a single task.  This prevents
RuntimeError when cancel scopes are exited from a different task or
in wrong LIFO order.

- Query: dedicated _task_group_owner manages inner task group
- SubprocessCLITransport: dedicated _stderr_owner_task for stderr reader
- close() signals owner via anyio.Event instead of manual __aexit__

Fixes anthropics#454, related to anthropics#378.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
drillan added a commit to drillan/claude-agent-sdk-python that referenced this pull request Mar 5, 2026
Apply the owner task pattern from anthropics#364 to ensure inner task
groups are properly managed by a single task.  This prevents
RuntimeError when cancel scopes are exited from a different task or
in wrong LIFO order.

- Query: dedicated _task_group_owner manages inner task group
- SubprocessCLITransport: dedicated _stderr_owner_task for stderr reader
- close() signals owner via anyio.Event instead of manual __aexit__

Fixes anthropics#454, related to anthropics#378.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
drillan added a commit to drillan/claude-agent-sdk-python that referenced this pull request Mar 11, 2026
Apply the owner task pattern from anthropics#364 to ensure inner task
groups are properly managed by a single task.  This prevents
RuntimeError when cancel scopes are exited from a different task or
in wrong LIFO order.

- Query: dedicated _task_group_owner manages inner task group
- SubprocessCLITransport: dedicated _stderr_owner_task for stderr reader
- close() signals owner via anyio.Event instead of manual __aexit__

Fixes anthropics#454, related to anthropics#378.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
drillan added a commit to drillan/claude-agent-sdk-python that referenced this pull request Mar 17, 2026
Apply the owner task pattern from anthropics#364 to ensure inner task
groups are properly managed by a single task.  This prevents
RuntimeError when cancel scopes are exited from a different task or
in wrong LIFO order.

- Query: dedicated _task_group_owner manages inner task group
- SubprocessCLITransport: dedicated _stderr_owner_task for stderr reader
- close() signals owner via anyio.Event instead of manual __aexit__

Fixes anthropics#454, related to anthropics#378.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
drillan added a commit to drillan/claude-agent-sdk-python that referenced this pull request Mar 18, 2026
Apply the owner task pattern from anthropics#364 to ensure inner task
groups are properly managed by a single task.  This prevents
RuntimeError when cancel scopes are exited from a different task or
in wrong LIFO order.

- Query: dedicated _task_group_owner manages inner task group
- SubprocessCLITransport: dedicated _stderr_owner_task for stderr reader
- close() signals owner via anyio.Event instead of manual __aexit__

Fixes anthropics#454, related to anthropics#378.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
drillan added a commit to drillan/claude-agent-sdk-python that referenced this pull request Mar 23, 2026
Apply the owner task pattern from anthropics#364 to ensure inner task
groups are properly managed by a single task.  This prevents
RuntimeError when cancel scopes are exited from a different task or
in wrong LIFO order.

- Query: dedicated _task_group_owner manages inner task group
- SubprocessCLITransport: dedicated _stderr_owner_task for stderr reader
- close() signals owner via anyio.Event instead of manual __aexit__

Fixes anthropics#454, related to anthropics#378.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
drillan added a commit to drillan/claude-agent-sdk-python that referenced this pull request Mar 24, 2026
Apply the owner task pattern from anthropics#364 to ensure inner task
groups are properly managed by a single task.  This prevents
RuntimeError when cancel scopes are exited from a different task or
in wrong LIFO order.

- Query: dedicated _task_group_owner manages inner task group
- SubprocessCLITransport: dedicated _stderr_owner_task for stderr reader
- close() signals owner via anyio.Event instead of manual __aexit__

Fixes anthropics#454, related to anthropics#378.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
qing-ant added a commit that referenced this pull request Mar 26, 2026
…cleanup (#454) (#746)

## Problem

When users break out of the `async for` loop over `query()`, Python may
finalize the async generator in a different task than the one that
created the task group. This causes `close()` to call
`TaskGroup.__aexit__()` from a different task than `start()` called
`__aenter__()`, triggering:

```
RuntimeError: Attempted to exit cancel scope in a different task than it was entered in
```

Fixes #454.

## Root cause

The `Query` class was using anyio's `TaskGroup` with manual
`__aenter__`/`__aexit__` calls. anyio's cancel scopes have task affinity
— they must be exited by the same async task that entered them. During
async generator finalization, Python can schedule the generator's
cleanup in a different task, violating this invariant.

## Why PR #364 doesn't fix it

PR #364 introduces an "owner task pattern" that wraps the inner task
group in a dedicated owner task. However, it still creates an **outer**
task group (`_outer_tg`) using the same manual `__aenter__`/`__aexit__`
pattern, so the cross-task error just moves one level up. The tests in
that PR call `start()` and `close()` from the same task, so they don't
reproduce the actual failure scenario.

## Solution

Replace anyio `TaskGroup` with `asyncio.create_task()` for background
task management. `asyncio.create_task()` has no cancel scope, so
`close()` can cancel tasks from any task context without triggering the
RuntimeError.

Changes:
- **`query.py`**: Replace `_tg` (anyio TaskGroup) with `_read_task`
(asyncio Task) and `_child_tasks` (set of asyncio Tasks). Add
`spawn_task()` method as the replacement for `_tg.start_soon()`.
- **`client.py` / `_internal/client.py`**: Update callers to use
`spawn_task()` instead of `_tg.start_soon()`.
- **`test_query.py`**: Add tests that reproduce the cross-task cleanup
scenario.

## Test plan

- All 356 existing tests pass
- New test `test_close_from_different_task_does_not_raise` verifies
cross-task cleanup works
- New test `test_close_from_same_task_still_works` verifies normal
cleanup still works
- Linting (ruff) and type checking (mypy) pass
qing-ant added a commit that referenced this pull request Apr 24, 2026
## Problem

PR #746 (v0.1.51+) replaced `anyio.TaskGroup` with
`asyncio.create_task()` in `Query` to fix #378 (100% CPU spin in
`_deliver_cancellation`) and #454 (cross-task cancel-scope
`RuntimeError`). However, `asyncio.get_running_loop()` raises
`RuntimeError: no running event loop` under trio, breaking
`ClaudeSDKClient.connect()` for trio users since v0.1.51:

```python
import trio
from claude_agent_sdk import ClaudeSDKClient
async def main():
    async with ClaudeSDKClient() as c:  # RuntimeError: no running event loop
        ...
trio.run(main)
```

## Approach

**sniffio dispatch.** Adds `_internal/_task_compat.py` with a
`TaskHandle` abstraction and `spawn_detached(coro)`:

- **asyncio** → `loop.create_task()` wrapped in `_AsyncioTaskHandle`.
Behaviorally identical to PR #746 — `cancel()`, `done()`,
`add_done_callback()`, and `wait()` are thin pass-throughs to
`asyncio.Task`. **#378/#454 stay fixed.**
- **trio** → `trio.lowlevel.spawn_system_task` with a per-task
`CancelScope` wrapped in `_TrioTaskHandle`. `CancelScope.cancel()` is
sync and has no task affinity, so `close()` from any task is safe (the
#454 invariant holds for trio too).

`Query.start/spawn_task/close` and `_spawn_control_request_handler` now
use `TaskHandle`. `query.py` no longer imports `asyncio`; the two
cancellation-exception sites use `anyio.get_cancelled_exc_class()`.

The full anyio-TaskGroup restructure was previously attempted in #364
and proved tricky; this change keeps the asyncio path untouched to
minimize regression risk.

## Why `trio.lowlevel.spawn_system_task`?

trio has no `create_task()` equivalent by design (structured
concurrency). `spawn_system_task` is the documented escape hatch for
detached tasks. Each spawned coro is wrapped in `try/except
BaseException` so a failure can never propagate as `TrioInternalError`;
the exception is stored on the handle and re-raised by `wait()`.

## Out of scope (follow-ups)

`_internal/session_resume.py`, `_internal/transcript_mirror_batcher.py`,
and `_internal/sessions.py` also have direct `asyncio` usage. These are
opt-in features gated behind `options.session_store` and were never
trio-compatible — not regressions from #746. Tracked separately.

## Testing

- New `tests/test_task_compat.py` — 9 unit tests, both backends
(spawn/wait, cancel, done-callback, exception propagation, cross-task
cancel)
- New `TestQueryTrioBackend` (3 tests) — `start`/`close`/`spawn_task`
under `anyio.run(..., backend="trio")`
- New `TestClaudeSDKClientTrioBackend::test_client_connect_under_trio` —
the repro above as a unit test
- Existing `TestQueryCrossTaskCleanup` (#454 guard) and
`TestControlCancelRequest` (#751 guard) still pass
- 748 passed, 3 skipped; ruff + mypy clean
- Manual e2e: real query under `trio.run()` against live CLI returns
`AssistantMessage` + `ResultMessage(success)`

## Deps

Adds `sniffio>=1.0.0` to runtime deps (already a transitive dep of
`anyio>=4.0.0`; just made explicit).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant