Skip to content

feat(verifier): record agent trajectories#2131

Open
miguelg719 wants to merge 6 commits into
miguelgonzalez/verifier-02-backend-routingfrom
miguelgonzalez/verifier-03-trajectory-recorder
Open

feat(verifier): record agent trajectories#2131
miguelg719 wants to merge 6 commits into
miguelgonzalez/verifier-02-backend-routingfrom
miguelgonzalez/verifier-03-trajectory-recorder

Conversation

@miguelg719
Copy link
Copy Markdown
Collaborator

@miguelg719 miguelg719 commented May 15, 2026

Why

The new verifier needs richer evidence than a final screenshot, especially for DOM and Hybrid agent modes where the important facts often live in tool returns, ARIA snapshots, and per-step observations. This PR adds trajectory recording without changing the verifier judgment engine.

What Changed

  • Added typed agent bus events for screenshot, step-finished, step-observed, and final-answer events.
  • Added listener-gated post-step probes for screenshots and ARIA trees.
  • Attached the settled post-turn probe to every tool call in a DOM/Hybrid turn.
  • Added CUA step evidence pairing and final answer capture.
  • Added TrajectoryRecorder persistence and a smoke script for trajectory shape and disk layout.

Tests

  • pnpm --filter @browserbasehq/stagehand run typecheck
  • pnpm --filter @browserbasehq/stagehand-evals run typecheck
  • node --import tsx packages/evals/scripts/verify-trajectory-recorder.ts
  • git diff --check

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented May 15, 2026

🦋 Changeset detected

Latest commit: 635b3d2

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 4 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch
@browserbasehq/stagehand-server-v4 Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 8 files

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant Agent as Agent Handlers
    participant Bus as Event Bus
    participant Recorder as TrajectoryRecorder
    participant FS as File System
    participant Page as Browser Page

    Note over Agent,FS: NEW: Step-level evidence capture (DOM/Hybrid mode)

    Agent->>Agent: onStepFinish callback fires
    Agent->>Agent: stepCounter++ (per tool call)
    Agent->>Bus: emit agent_step_finished_event
    Note over Bus: stepIndex, actionName, actionArgs, reasoning, toolOutput, finishedAt
    Agent->>Page: page.screenshot() (post-step probe)
    Page-->>Agent: screenshot Buffer
    Agent->>Agent: captureAriaTreeProbe(v3)
    Note over Agent: Best-effort, token-budgeted a11y tree capture
    Agent-->>Agent: ariaTree string | undefined
    loop For each tool call in turn
        Agent->>Bus: emit agent_screenshot_taken_event
        Note over Bus: stepIndex, screenshot, url, evidenceRole: "probe"
        Agent->>Bus: emit agent_step_observed_event
        Note over Bus: stepIndex, url, ariaTree (optional), scroll (optional)
    end
    opt done tool call present
        Agent->>Agent: Build lastFinalAnswer
        Agent->>Bus: emit agent_final_answer_event
    end

    Note over Agent,FS: NEW: Step-level evidence capture (CUA mode)

    Agent->>Agent: screenshotProvider called
    Agent->>Page: page.screenshot()
    Page-->>Agent: screenshot Buffer
    Agent->>Bus: emit agent_screenshot_taken_event
    Note over Bus: stepIndex++, screenshot, url, evidenceRole: "agent"
    Agent->>Agent: executeAction(action)
    Agent->>Agent: emitCuaActionStep()
    Agent->>Bus: emit agent_step_finished_event
    Note over Bus: stepIndex paired with preceding screenshot
    Agent->>Page: page.screenshot() (post-action probe)
    Page-->>Agent: probe screenshot
    Agent->>Bus: emit agent_screenshot_taken_event
    Note over Bus: same stepIndex, screenshot, url, evidenceRole: "probe"
    Agent->>Agent: captureAriaTreeProbe(v3)
    Agent->>Bus: emit agent_step_observed_event
    Note over Bus: stepIndex, url, ariaTree (optional)

    Note over Agent,FS: NEW: Trajectory assembly and persistence

    Recorder->>Bus: subscribe to agent_step_finished_event
    Recorder->>Bus: subscribe to agent_screenshot_taken_event
    Recorder->>Bus: subscribe to agent_step_observed_event
    Recorder->>Bus: subscribe to agent_final_answer_event
    Bus-->>Recorder: events arrive (may be out-of-order)
    Recorder->>Recorder: ensurePartial(stepIndex)
    Recorder->>Recorder: Merge evidence into partial steps

    alt persistEnabled (env-gated by VERIFIER_PERSIST_TRAJECTORIES)
        Recorder->>Recorder: assembleSteps()
        Recorder->>FS: mkdir -p .trajectories/{runId}/{taskId}/
        Recorder->>FS: write trajectory.json
        Recorder->>FS: write core.log
        Recorder->>FS: write task_data.json
        Recorder->>FS: write times.json
        Recorder->>FS: write screenshots/probe/{1..N}.png
        alt verdict provided
            Recorder->>FS: write scores/mmrubric_v1.json
            Recorder->>FS: update task_data.json with verdict
        end
    else persistence disabled
        Recorder->>Recorder: Return in-memory Trajectory only
    end
    Recorder-->>Agent: Trajectory object
Loading

Re-trigger cubic

@miguelg719 miguelg719 force-pushed the miguelgonzalez/verifier-03-trajectory-recorder branch from 72774c7 to da0c152 Compare May 15, 2026 21:23
@miguelg719 miguelg719 force-pushed the miguelgonzalez/verifier-02-backend-routing branch from d7d2c59 to 2765781 Compare May 15, 2026 21:23
@miguelg719 miguelg719 force-pushed the miguelgonzalez/verifier-03-trajectory-recorder branch from da0c152 to d77e596 Compare May 15, 2026 21:45
Comment thread packages/core/lib/v3/agent/AnthropicCUAClient.ts Outdated
@miguelg719 miguelg719 force-pushed the miguelgonzalez/verifier-03-trajectory-recorder branch from 8e4fbe2 to 56a3465 Compare May 15, 2026 22:33
Comment thread packages/core/lib/v3/agent/AnthropicCUAClient.ts Outdated
@miguelg719 miguelg719 force-pushed the miguelgonzalez/verifier-03-trajectory-recorder branch 2 times, most recently from 83b4e86 to fd043bc Compare May 16, 2026 04:40
@miguelg719 miguelg719 force-pushed the miguelgonzalez/verifier-03-trajectory-recorder branch from fd043bc to 635b3d2 Compare May 16, 2026 05:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant