feat(cache): add custom tool caching and fix replay navigation timing by hoverlover · Pull Request #1562 · browserbase/stagehand

hoverlover · 2026-01-16T22:41:08Z

Summary

This PR adds two improvements to the agent cache replay system:

1. Custom Tool Caching (fixes #1558)

Custom tools (like form-filling helpers) were not being recorded or replayed from cache, causing workflows to fail on cache replay.

Changes:

Add AgentReplayCustomToolStep type for caching custom tool invocations
Wrap custom tools with recording logic via wrapToolsForRecording()
Record custom tool calls in both AISDK (hybrid/dom) and CUA agent modes
Replay custom tools by re-executing with cached arguments
Thread tools parameter through tryReplay and related methods

2. Navigation Waiting (fixes #1561)

Cache replay was executing steps too fast without waiting for page navigation to complete, causing subsequent steps to run on the wrong page.

Changes:

Detect URL changes after actions during replay
Wait for page load (waitForLoadState) before continuing to next step
Apply to replayAgentActStep, replayAgentFillFormStep, replayAgentKeysStep
Prevents race conditions where steps execute before navigation completes

Test Plan

Tested with a real-world workflow:

Login flow with custom fillUsername and fillPassword tools
Search flow with custom fillSearch tool
Multi-page navigation (login → home → search → product page)
Verified 0 token usage on cache replay (no LLM calls)
Verified consistent results across 3+ replay runs

Files Changed

packages/core/lib/v3/types/private/cache.ts - Add AgentReplayCustomToolStep type
packages/core/lib/v3/v3.ts - Add wrapToolsForRecording(), update tool threading
packages/core/lib/v3/handlers/v3CuaAgentHandler.ts - Add custom tool recording for CUA mode
packages/core/lib/v3/cache/AgentCache.ts - Add custom tool replay, add navigation waiting

Summary by cubic

Add custom tool caching to agent replay and wait for navigation between steps. This fixes tool-based workflows and prevents steps from running on the wrong page.

New Features
- Record and replay custom tool calls as custom_tool steps.
- Wrap tools to auto-record; replay re-executes with cached args.
- Pass tools through tryReplay and stream replay.
Bug Fixes
- Detect URL changes during replay and wait for page load.
- Applied to act, fillForm, and keys (e.g., Enter submits forms).
- Prevents steps executing on the wrong page.
- Clean up recording state if stream creation fails.

^{Written for commit 36d3274. Summary will update on new commits.}

This commit adds two improvements to the agent cache replay system: 1. Custom Tool Caching (fixes browserbase#1558) - Add AgentReplayCustomToolStep type for caching custom tool invocations - Wrap custom tools with recording logic via wrapToolsForRecording() - Record custom tool calls in both AISDK (hybrid) and CUA agent modes - Replay custom tools by re-executing with cached arguments - Thread tools parameter through tryReplay and related methods 2. Navigation Waiting (fixes browserbase#1561) - Detect URL changes after actions during replay - Wait for page load (waitForLoadState) before continuing to next step - Apply to replayAgentActStep, replayAgentFillFormStep, replayAgentKeysStep - Prevents race conditions where steps execute before navigation completes These changes enable reliable cache replay for workflows that include: - Custom form-filling tools (fillUsername, fillPassword, etc.) - Multi-page navigation sequences - Form submissions via Enter key Co-Authored-By: Claude <noreply@anthropic.com>

changeset-bot · 2026-01-16T22:41:12Z

⚠️ No Changeset found

Latest commit: 36d3274

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

greptile-apps · 2026-01-16T22:44:47Z

Greptile Summary

This PR extends agent cache replay to support custom tools and fixes navigation timing issues during replay. The implementation wraps custom tools to record their invocations and detects URL changes to wait for page load completion.

Key Changes:

Added AgentReplayCustomToolStep type for caching custom tool calls
Implemented wrapToolsForRecording() to intercept and record tool executions
Added custom tool replay in replayAgentCustomToolStep()
Navigation detection: capture URL before/after actions, wait for load state if URL changes
Applied navigation waiting to replayAgentActStep, replayAgentFillFormStep, and replayAgentKeysStep

Critical Issue Found:

tryReplay() is called with unwrapped tools instead of wrappedTools in 3 locations (lines 1856, 1944, 1989), causing a mismatch between what's recorded during execution vs. what's available during replay

Confidence Score: 2/5

This PR has a critical bug that will break custom tool replay functionality
The implementation passes unwrapped tools to replay methods instead of wrapped tools, causing tools during replay to lack the recording wrappers present during initial execution. This breaks the core functionality being added.
packages/core/lib/v3/v3.ts requires immediate attention - the tool wrapping logic has a critical bug at lines 1856, 1944, and 1989

Important Files Changed

Filename	Overview
packages/core/lib/v3/v3.ts	Adds `wrapToolsForRecording` to capture custom tool invocations. Critical bug: passes unwrapped `tools` to `tryReplay` instead of `wrappedTools`, causing mismatch between recording and replay.
packages/core/lib/v3/cache/AgentCache.ts	Adds custom tool replay support and navigation waiting logic. Implementation correctly detects URL changes and waits for page load with proper timeout handling.
packages/core/lib/v3/handlers/v3CuaAgentHandler.ts	Adds recording for custom_tool actions in CUA mode. Implementation correctly records tool name and arguments when recording is active.
packages/core/lib/v3/types/private/cache.ts	Adds `AgentReplayCustomToolStep` type to support custom tool caching. Type definition is clean and well-documented.

Sequence Diagram

sequenceDiagram
    participant User
    participant V3
    participant AgentCache
    participant Handler
    participant CustomTool
    participant Page

    User->>V3: agent({ tools, cacheDir })
    V3->>V3: wrapToolsForRecording(tools)
    Note over V3: Wraps tools to record invocations
    
    alt Cache Hit
        V3->>AgentCache: tryReplay(cacheContext, tools)
        AgentCache->>AgentCache: Load cached steps
        loop For each cached step
            alt Custom Tool Step
                AgentCache->>CustomTool: execute(cached args)
                CustomTool-->>AgentCache: result
            else Navigation Action
                AgentCache->>Page: takeDeterministicAction()
                Page-->>AgentCache: action result
                AgentCache->>AgentCache: Check URL change
                alt URL Changed
                    AgentCache->>Page: waitForLoadState("load", 10000)
                    Page-->>AgentCache: load complete
                end
            end
        end
        AgentCache-->>V3: cached result
        V3-->>User: result
    else Cache Miss
        V3->>V3: beginAgentReplayRecording()
        V3->>Handler: execute/stream(wrappedTools)
        loop Agent execution
            Handler->>CustomTool: execute(args)
            Note over CustomTool: Wrapped tool records invocation
            CustomTool->>V3: recordAgentReplayStep({ type: "custom_tool" })
            CustomTool-->>Handler: tool result
            Handler->>Page: perform actions
            Page-->>Handler: action results
        end
        Handler-->>V3: final result
        V3->>V3: endAgentReplayRecording()
        V3->>AgentCache: store(steps, result)
        V3-->>User: result
    end

greptile-apps

_{4 files reviewed, 3 comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps · 2026-01-16T22:44:46Z

packages/core/lib/v3/v3.ts

+                const replayed = await this.agentCache.tryReplay(
+                  cacheContext,
+                  undefined,
+                  tools,


logic: passing unwrapped tools instead of wrappedTools here - replay will execute the original tools without recording wrappers, causing double-recording if cache miss falls through to re-execution

Suggested change

tools,

tools: wrappedTools,

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/core/lib/v3/v3.ts Line: 1856:1856 Comment: **logic:** passing unwrapped `tools` instead of `wrappedTools` here - replay will execute the original tools without recording wrappers, causing double-recording if cache miss falls through to re-execution ```suggestion tools: wrappedTools, ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-16T22:44:46Z

packages/core/lib/v3/v3.ts

              const replayed = await this.agentCache.tryReplayAsStream(
                cacheContext,
                llmClient,
+                tools,


logic: passing unwrapped tools instead of wrappedTools - same issue in streaming mode

Suggested change

tools,

tools: wrappedTools,

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/core/lib/v3/v3.ts Line: 1944:1944 Comment: **logic:** passing unwrapped `tools` instead of `wrappedTools` - same issue in streaming mode ```suggestion tools: wrappedTools, ``` How can I resolve this? If you propose a fix, please make it concise.

greptile-apps · 2026-01-16T22:44:47Z

packages/core/lib/v3/v3.ts

            const replayed = await this.agentCache.tryReplay(
              cacheContext,
              llmClient,


logic: passing unwrapped tools instead of wrappedTools - same issue in non-streaming mode (second agent handler path)

Suggested change

const replayed = await this.agentCache.tryReplay(

cacheContext,

llmClient,

const replayed = await this.agentCache.tryReplay(

cacheContext,

llmClient,

wrappedTools,

Prompt To Fix With AI

This is a comment left during a code review. Path: packages/core/lib/v3/v3.ts Line: 1989:1991 Comment: **logic:** passing unwrapped `tools` instead of `wrappedTools` - same issue in non-streaming mode (second agent handler path) ```suggestion const replayed = await this.agentCache.tryReplay( cacheContext, llmClient, wrappedTools, ``` How can I resolve this? If you propose a fix, please make it concise.

cubic-dev-ai

1 issue found across 4 files

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="packages/core/lib/v3/v3.ts">

<violation number="1" location="packages/core/lib/v3/v3.ts:1955">
P2: Recording is started before stream creation without error handling; exceptions leave cache stuck in recording state</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

packages/core/lib/v3/v3.ts

If handler.stream() throws after beginAgentReplayRecording() is called, the recording state was never cleaned up, leaving the cache stuck in recording mode. Added try-catch around stream creation to call discardAgentReplayRecording() on error, matching the error handling in non-streaming mode. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Clarifies why unwrapped tools (not wrappedTools) are passed to tryReplay(). During replay, tools execute with cached arguments - using wrappedTools would cause the recording wrapper to record these replayed calls, leading to duplicate cache entries. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

hoverlover · 2026-01-19T15:11:25Z

Response to Review Comments

Re: Using `wrappedTools` instead of `tools` in `tryReplay()` calls

After analysis, I believe the current code is correct and should not be changed to use wrappedTools. Here's why:

During replay, tryReplay() executes custom tools with their cached arguments via replayAgentCustomToolStep(), which calls tool.execute().

If we passed wrappedTools:

The recording wrapper would intercept these replayed tool calls
Each replayed call would be recorded as a new step
This would cause duplicate entries if the cache is later updated

The current flow is correct:

Cache miss (initial execution): Handler uses wrappedTools → tool invocations are recorded → steps stored in cache
Cache hit (replay): tryReplay() uses unwrapped tools → cached tools execute without recording → no duplicates

I've added inline comments at all three locations to clarify this intentional design decision (commit 36d3274).

Fixed: Recording state cleanup (P2)

Added try-catch around handler.stream() to call discardAgentReplayRecording() on error, preventing the cache from being stuck in recording mode if stream creation fails (commit 63a6d52).

greptile-apps bot reviewed Jan 16, 2026

View reviewed changes

cubic-dev-ai bot reviewed Jan 16, 2026

View reviewed changes

packages/core/lib/v3/v3.ts Show resolved Hide resolved

hoverlover and others added 2 commits January 19, 2026 09:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cache): add custom tool caching and fix replay navigation timing#1562

feat(cache): add custom tool caching and fix replay navigation timing#1562
hoverlover wants to merge 3 commits intobrowserbase:mainfrom
hoverlover:feature/custom-tool-caching

hoverlover commented Jan 16, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

changeset-bot bot commented Jan 16, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Jan 16, 2026

Uh oh!

greptile-apps bot left a comment

Uh oh!

greptile-apps bot Jan 16, 2026

Uh oh!

greptile-apps bot Jan 16, 2026

Uh oh!

greptile-apps bot Jan 16, 2026

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

hoverlover commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hoverlover commented Jan 16, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Custom Tool Caching (fixes #1558)

2. Navigation Waiting (fixes #1561)

Test Plan

Files Changed

Summary by cubic

Uh oh!

changeset-bot bot commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

greptile-apps bot commented Jan 16, 2026

Greptile Summary

Confidence Score: 2/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot Jan 16, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hoverlover commented Jan 19, 2026

Response to Review Comments

Re: Using wrappedTools instead of tools in tryReplay() calls

Fixed: Recording state cleanup (P2)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hoverlover commented Jan 16, 2026 •

edited by cubic-dev-ai bot

Loading

changeset-bot bot commented Jan 16, 2026 •

edited

Loading

Re: Using `wrappedTools` instead of `tools` in `tryReplay()` calls