Reliability: missing resource bounds on streaming (liveness hang, download guard, token cap)

Three related places where mellea has no bound on a resource. Prior reports: #650 (streaming hang confirmed in the wild), #591 (missing token cap caused flaky tests). Related: #1164 (structured cancellation for Ollama).

## Streams can hang forever

`send_to_queue` in `mellea/helpers/async_helpers.py` feeds all backend streaming responses into an asyncio.Queue:

```python
async for item in aresponse:   # no timeout between chunks
    await aqueue.put(item)
```

If the backend stalls — server stops sending without closing the connection — this loop blocks forever. The sentinel never reaches the queue, so every downstream consumer blocks with it (`core/base.py:623`, `stdlib/streaming.py:294,361`, `stdlib/sampling/base.py:336`). All four are the same root cause. This has been seen in practice (#650).

## Unbounded download in retriever fixture

`mellea/formatters/granite/retrievers/util.py` downloads numbered parquet files until a 404 with no count guard. The corpus allowlist (4 names, hardcoded repo) limits exposure — the largest corpus currently has 20 parts — but there is nothing to catch a malformed or unexpectedly grown dataset.

## MAX_NEW_TOKENS backend defaults vary widely

Backend defaults for `MAX_NEW_TOKENS` vary widely and some are very low — vLLM defaults to 16 tokens, which silently truncates most real responses (#591). Callers need to set `ModelOption.MAX_NEW_TOKENS` explicitly but there is nothing in the library to make that expectation clear.

## What is deliberately unbounded (not a bug)

`_chunk_queue` and `_event_queue` in `stdlib/streaming.py` are unbounded by design — the inline comment notes that consumption is opt-in. Not proposing a change there.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reliability: missing resource bounds on streaming (liveness hang, download guard, token cap) #1235

Streams can hang forever

Unbounded download in retriever fixture

MAX_NEW_TOKENS backend defaults vary widely

What is deliberately unbounded (not a bug)

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Reliability: missing resource bounds on streaming (liveness hang, download guard, token cap) #1235

Description

Streams can hang forever

Unbounded download in retriever fixture

MAX_NEW_TOKENS backend defaults vary widely

What is deliberately unbounded (not a bug)

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions