Summary
GENERATION_POST_CALL fires from inside ModelOutputThunk.astream, so any backend whose _generate_from_context returns a fully-computed MOT skips the hook entirely.
avalue() short-circuits at base.py:583 when _computed=True and never enters astream, so the post-call hook is silent for pre-computed MOTs even though the value is available and post-processing is done.
Repro
import asyncio
from mellea.backends.dummy import DummyBackend
from mellea.plugins import HookType, hook, register
from mellea.stdlib.components import CBlock
from mellea.stdlib.context import SimpleContext
observed = []
@hook("generation_post_call")
async def recorder(payload, ctx):
observed.append(payload.generation_id)
register(recorder)
async def main():
backend = DummyBackend(responses=["hi"])
mot, _ = await backend.generate_from_context(CBlock("test"), SimpleContext())
await mot.avalue()
asyncio.run(main())
print("post_call fired:", observed) # → []
generation_pre_call fires; generation_post_call does not.
Impact
Scope distinction
The generate_from_raw (batch) path is not affected. PR #1181 added the full generation_batch_* hook surface with backends firing those hooks themselves at the end of the method, which sidesteps this trap by not depending on astream to drive post-call. The chat path's firing site, unchanged since the hook system landed in #582, retains the issue.
Suggested fix
One viable shape: fire generation_post_call from Backend.generate_from_context (mellea/core/backend.py:117) when mot.is_computed(), mirroring the payload construction in astream. Sketch:
mot._generation_id = generation_id
if mot.is_computed() and has_plugins(HookType.GENERATION_POST_CALL):
from ..plugins.hooks.generation import GenerationPostCallPayload
glog = mot._generate_log
prompt = glog.prompt if glog and glog.prompt else ""
latency_ms = (
(datetime.datetime.now() - mot._start).total_seconds() * 1000
if mot._start else 0.0
)
await invoke_hook(
HookType.GENERATION_POST_CALL,
GenerationPostCallPayload(
prompt=prompt,
model_output=mot,
latency_ms=latency_ms,
generation_id=generation_id,
),
)
return mot, new_ctx
Other approaches worth considering before settling on this one:
- Move the firing site into
avalue itself, so any path leading to a computed value triggers the hook regardless of who pre-computed it.
- Push responsibility onto backend authors via an explicit contract (firing site stays in
astream, pre-computing backends document their own post-call invocation).
- Extract a shared helper that both
astream and the new firing site call, to prevent drift.
Pick whichever shape best fits the broader hook-system contract.
Acceptance
Summary
GENERATION_POST_CALLfires from insideModelOutputThunk.astream, so any backend whose_generate_from_contextreturns a fully-computed MOT skips the hook entirely.avalue()short-circuits atbase.py:583when_computed=Trueand never entersastream, so the post-call hook is silent for pre-computed MOTs even though the value is available and post-processing is done.Repro
generation_pre_callfires;generation_post_calldoes not.Impact
generation_post_callmisses calls served by pre-computing backends. Concrete cases:mellea/telemetry/tracing_plugins.py, introduced in refactor(telemetry)!: move backend tracing onto plugin/hook pattern #1181) starts a span inpre_calland never closes it. The leaked span persists in_in_flight_spans, OTel context retains it, subsequent calls inherit a stale parent.generation_post_callto record their data. Pre-computed paths emit zero metrics.DummyBackend(smoke tests, internal tooling).await_result=FalseMOTs and abandoned sampling candidates.Scope distinction
The
generate_from_raw(batch) path is not affected. PR #1181 added the fullgeneration_batch_*hook surface with backends firing those hooks themselves at the end of the method, which sidesteps this trap by not depending onastreamto drive post-call. The chat path's firing site, unchanged since the hook system landed in #582, retains the issue.Suggested fix
One viable shape: fire
generation_post_callfromBackend.generate_from_context(mellea/core/backend.py:117) whenmot.is_computed(), mirroring the payload construction inastream. Sketch:Other approaches worth considering before settling on this one:
avalueitself, so any path leading to a computed value triggers the hook regardless of who pre-computed it.astream, pre-computing backends document their own post-call invocation).astreamand the new firing site call, to prevent drift.Pick whichever shape best fits the broader hook-system contract.
Acceptance
observedlist._generate_from_context.BackendTracingPlugin._in_flight_spansis empty after aDummyBackendcall (regression guard for the span leak).GenerationPostCallPayloaddocstring atmellea/plugins/hooks/generation.py:38-42— the current "fires beforegenerate_from_contextreturns" claim is inaccurate; rewrite to match the new firing semantics.