Skip to content

bug(plugins): generation_post_call doesn't fire for pre-computed MOTs from _generate_from_context #1229

@ajbozarth

Description

@ajbozarth

Summary

GENERATION_POST_CALL fires from inside ModelOutputThunk.astream, so any backend whose _generate_from_context returns a fully-computed MOT skips the hook entirely.

avalue() short-circuits at base.py:583 when _computed=True and never enters astream, so the post-call hook is silent for pre-computed MOTs even though the value is available and post-processing is done.

Repro

import asyncio
from mellea.backends.dummy import DummyBackend
from mellea.plugins import HookType, hook, register
from mellea.stdlib.components import CBlock
from mellea.stdlib.context import SimpleContext

observed = []

@hook("generation_post_call")
async def recorder(payload, ctx):
    observed.append(payload.generation_id)

register(recorder)

async def main():
    backend = DummyBackend(responses=["hi"])
    mot, _ = await backend.generate_from_context(CBlock("test"), SimpleContext())
    await mot.avalue()

asyncio.run(main())
print("post_call fired:", observed)  # → []

generation_pre_call fires; generation_post_call does not.

Impact

Scope distinction

The generate_from_raw (batch) path is not affected. PR #1181 added the full generation_batch_* hook surface with backends firing those hooks themselves at the end of the method, which sidesteps this trap by not depending on astream to drive post-call. The chat path's firing site, unchanged since the hook system landed in #582, retains the issue.

Suggested fix

One viable shape: fire generation_post_call from Backend.generate_from_context (mellea/core/backend.py:117) when mot.is_computed(), mirroring the payload construction in astream. Sketch:

mot._generation_id = generation_id

if mot.is_computed() and has_plugins(HookType.GENERATION_POST_CALL):
    from ..plugins.hooks.generation import GenerationPostCallPayload
    glog = mot._generate_log
    prompt = glog.prompt if glog and glog.prompt else ""
    latency_ms = (
        (datetime.datetime.now() - mot._start).total_seconds() * 1000
        if mot._start else 0.0
    )
    await invoke_hook(
        HookType.GENERATION_POST_CALL,
        GenerationPostCallPayload(
            prompt=prompt,
            model_output=mot,
            latency_ms=latency_ms,
            generation_id=generation_id,
        ),
    )

return mot, new_ctx

Other approaches worth considering before settling on this one:

  • Move the firing site into avalue itself, so any path leading to a computed value triggers the hook regardless of who pre-computed it.
  • Push responsibility onto backend authors via an explicit contract (firing site stays in astream, pre-computing backends document their own post-call invocation).
  • Extract a shared helper that both astream and the new firing site call, to prevent drift.

Pick whichever shape best fits the broader hook-system contract.

Acceptance

  • Repro above prints a non-empty observed list.
  • Test added covering the pre-computed path through _generate_from_context.
  • (Once PR refactor(telemetry)!: move backend tracing onto plugin/hook pattern #1181 merges) BackendTracingPlugin._in_flight_spans is empty after a DummyBackend call (regression guard for the span leak).
  • Update GenerationPostCallPayload docstring at mellea/plugins/hooks/generation.py:38-42 — the current "fires before generate_from_context returns" claim is inaccurate; rewrite to match the new firing semantics.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/stdlibCore abstractions: Context, MOT, SamplingStrategy, formatters, serializationarea/telemetryOTel spans, metrics, tracing, semconvbugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions