bug(plugins): generation_post_call doesn't fire for pre-computed MOTs from _generate_from_context

## Summary

`GENERATION_POST_CALL` fires from inside [`ModelOutputThunk.astream`](https://github.com/generative-computing/mellea/blob/main/mellea/core/base.py#L721-L738), so any backend whose `_generate_from_context` returns a fully-computed MOT skips the hook entirely.

`avalue()` short-circuits at [`base.py:583`](https://github.com/generative-computing/mellea/blob/main/mellea/core/base.py#L583) when `_computed=True` and never enters `astream`, so the post-call hook is silent for pre-computed MOTs even though the value is available and post-processing is done.

## Repro

```python
import asyncio
from mellea.backends.dummy import DummyBackend
from mellea.plugins import HookType, hook, register
from mellea.stdlib.components import CBlock
from mellea.stdlib.context import SimpleContext

observed = []

@hook("generation_post_call")
async def recorder(payload, ctx):
    observed.append(payload.generation_id)

register(recorder)

async def main():
    backend = DummyBackend(responses=["hi"])
    mot, _ = await backend.generate_from_context(CBlock("test"), SimpleContext())
    await mot.avalue()

asyncio.run(main())
print("post_call fired:", observed)  # → []
```

`generation_pre_call` fires; `generation_post_call` does not.

## Impact

- Any plugin subscribing to `generation_post_call` misses calls served by pre-computing backends. Concrete cases:
  - **BackendTracingPlugin** ([`mellea/telemetry/tracing_plugins.py`](https://github.com/generative-computing/mellea/blob/main/mellea/telemetry/tracing_plugins.py), introduced in #1181) starts a span in `pre_call` and never closes it. The leaked span persists in `_in_flight_spans`, OTel context retains it, subsequent calls inherit a stale parent.
  - **Metrics plugins** (token, latency, error, cost) rely on `generation_post_call` to record their data. Pre-computed paths emit zero metrics.
- Live trigger today: `DummyBackend` (smoke tests, internal tooling).
- Forward-looking: any cache layer or test backend that returns pre-computed MOTs.
- Pre-existing since the hook system landed in #582; surfaced during PR #1181 review by @planetf1.
- **Scope note:** see [comment below](https://github.com/generative-computing/mellea/issues/1229#issuecomment-4664217989) — the pre-computed-MOT case is one shape of a broader "pre_call fires, post_call doesn't" family that includes un-awaited `await_result=False` MOTs and abandoned sampling candidates.

## Scope distinction

The `generate_from_raw` (batch) path is not affected. PR #1181 added the full `generation_batch_*` hook surface with backends firing those hooks themselves at the end of the method, which sidesteps this trap by not depending on `astream` to drive post-call. The chat path's firing site, unchanged since the hook system landed in #582, retains the issue.

## Suggested fix

One viable shape: fire `generation_post_call` from `Backend.generate_from_context` ([`mellea/core/backend.py:117`](https://github.com/generative-computing/mellea/blob/main/mellea/core/backend.py#L117)) when `mot.is_computed()`, mirroring the payload construction in `astream`. Sketch:

```python
mot._generation_id = generation_id

if mot.is_computed() and has_plugins(HookType.GENERATION_POST_CALL):
    from ..plugins.hooks.generation import GenerationPostCallPayload
    glog = mot._generate_log
    prompt = glog.prompt if glog and glog.prompt else ""
    latency_ms = (
        (datetime.datetime.now() - mot._start).total_seconds() * 1000
        if mot._start else 0.0
    )
    await invoke_hook(
        HookType.GENERATION_POST_CALL,
        GenerationPostCallPayload(
            prompt=prompt,
            model_output=mot,
            latency_ms=latency_ms,
            generation_id=generation_id,
        ),
    )

return mot, new_ctx
```

Other approaches worth considering before settling on this one:

- Move the firing site into `avalue` itself, so any path leading to a computed value triggers the hook regardless of who pre-computed it.
- Push responsibility onto backend authors via an explicit contract (firing site stays in `astream`, pre-computing backends document their own post-call invocation).
- Extract a shared helper that both `astream` and the new firing site call, to prevent drift.

Pick whichever shape best fits the broader hook-system contract.

## Acceptance

- [ ] Repro above prints a non-empty `observed` list.
- [ ] Test added covering the pre-computed path through `_generate_from_context`.
- [ ] (Once PR #1181 merges) `BackendTracingPlugin._in_flight_spans` is empty after a `DummyBackend` call (regression guard for the span leak).
- [ ] Update `GenerationPostCallPayload` docstring at `mellea/plugins/hooks/generation.py:38-42` — the current "fires before `generate_from_context` returns" claim is inaccurate; rewrite to match the new firing semantics.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug(plugins): generation_post_call doesn't fire for pre-computed MOTs from _generate_from_context #1229

Summary

Repro

Impact

Scope distinction

Suggested fix

Acceptance

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

bug(plugins): generation_post_call doesn't fire for pre-computed MOTs from _generate_from_context #1229

Description

Summary

Repro

Impact

Scope distinction

Suggested fix

Acceptance

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions