Skip to content

fix(hf): fire _cancel_event on STREAM_TIMEOUT for direct avalue/astream paths #1242

@planetf1

Description

@planetf1

Problem

When send_to_queue hits a STREAM_TIMEOUT, it puts a TimeoutError on the queue and returns early. For the HuggingFace backend, this leaves the model.generate() worker thread running until natural completion — the _cancel_event stopping criterion is never fired.

The consumer path determines whether the leak occurs:

  • stream_with_chunking (the high-level chunking orchestrator): calls await mot.cancel_generation(error=exc) when it catches the TimeoutError, which fires _cancel_hook → the thread stops. ✅
  • avalue() / astream() (direct access): the TimeoutError is raised without triggering cancel_generation(). The HF worker thread continues generating into an orphaned AsyncTextIteratorStreamer until it finishes naturally. ❌

This wastes GPU/CPU and holds the thread for the remainder of the generation.

Root cause

send_to_queue has no reference to the ModelOutputThunk or its _cancel_hook, so it cannot trigger cancellation. The hook is wired in the backend (output._cancel_hook = _cancel_event.set) but is only reachable via mot.cancel_generation().

Impact

Not a correctness bug for the consumer — the TimeoutError propagates correctly. The worker thread and GPU computation leak for the remainder of the generation after timeout on direct avalue()/astream() calls.

Possible approaches

  1. Thread a cancel callback into send_to_queue so it can fire on timeout.
  2. Ensure all timeout-raising paths call cancel_generation() before returning to the consumer.
  3. Route the avalue()/astream() paths through stream_with_chunking so the existing mitigation covers them.

Related

Identified during review of #1236 (inter-chunk stream timeout). The aclose() cleanup path in send_to_queue was also ineffective for this reason (fixed in #1236).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions