fix(hf): fire _cancel_event on STREAM_TIMEOUT for direct avalue/astream paths

## Problem

When `send_to_queue` hits a `STREAM_TIMEOUT`, it puts a `TimeoutError` on the queue and returns early. For the HuggingFace backend, this leaves the `model.generate()` worker thread running until natural completion — the `_cancel_event` stopping criterion is never fired.

The consumer path determines whether the leak occurs:

- **`stream_with_chunking`** (the high-level chunking orchestrator): calls `await mot.cancel_generation(error=exc)` when it catches the `TimeoutError`, which fires `_cancel_hook` → the thread stops. ✅
- **`avalue()` / `astream()`** (direct access): the `TimeoutError` is raised without triggering `cancel_generation()`. The HF worker thread continues generating into an orphaned `AsyncTextIteratorStreamer` until it finishes naturally. ❌

This wastes GPU/CPU and holds the thread for the remainder of the generation.

## Root cause

`send_to_queue` has no reference to the `ModelOutputThunk` or its `_cancel_hook`, so it cannot trigger cancellation. The hook is wired in the backend (`output._cancel_hook = _cancel_event.set`) but is only reachable via `mot.cancel_generation()`.

## Impact

Not a correctness bug for the consumer — the `TimeoutError` propagates correctly. The worker thread and GPU computation leak for the remainder of the generation after timeout on direct `avalue()`/`astream()` calls.

## Possible approaches

1. Thread a cancel callback into `send_to_queue` so it can fire on timeout.
2. Ensure all timeout-raising paths call `cancel_generation()` before returning to the consumer.
3. Route the `avalue()`/`astream()` paths through `stream_with_chunking` so the existing mitigation covers them.

## Related

Identified during review of #1236 (inter-chunk stream timeout). The `aclose()` cleanup path in `send_to_queue` was also ineffective for this reason (fixed in #1236).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(hf): fire _cancel_event on STREAM_TIMEOUT for direct avalue/astream paths #1242

Problem

Root cause

Impact

Possible approaches

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

fix(hf): fire _cancel_event on STREAM_TIMEOUT for direct avalue/astream paths #1242

Description

Problem

Root cause

Impact

Possible approaches

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions