feat(verifier): add evaluator backend facade#2129
Open
miguelg719 wants to merge 2 commits into
Open
Conversation
🦋 Changeset detectedLatest commit: 513b9d9 The changes in this PR will be included in the next version bump. This PR includes changesets to release 4 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Contributor
There was a problem hiding this comment.
No issues found across 4 files
Confidence score: 5/5
- Automated review surfaced no issues in the provided summaries.
- No files require special attention.
Architecture diagram
sequenceDiagram
participant Client as Caller Code
participant Facade as V3Evaluator (Facade)
participant Legacy as LegacyV3Evaluator
participant LLM as LLM Client
participant V3 as V3 Instance
participant Page as Page (Browser)
Note over Client,Page: Evaluator Backend Selection Flow
Client->>Facade: new V3Evaluator(v3, { backend })
alt backend = "verifier"
Facade->>Facade: Store backend = "verifier"
Note over Facade: Verifier backend not yet available
else backend = "legacy" (default)
Facade->>Facade: Read STAGEHAND_EVALUATOR_BACKEND env
Facade->>Legacy: Create LegacyV3Evaluator instance
Note over Facade,Legacy: NEW: Delegates all calls to LegacyV3Evaluator
end
Note over Client,Page: ask() - Legacy Backend Flow
Client->>Facade: ask(options)
Facade->>Facade: Check backend
alt backend = "legacy"
Facade->>Legacy: ask(options)
Legacy->>Legacy: Validate question & answer/screenshot
alt screenshot provided as array
Legacy->>Legacy: _evaluateWithMultipleScreenshots()
Legacy->>LLM: createChatCompletion() with multiple images
else screenshot = true (single)
Legacy->>Page: awaitActivePage()
Page-->>Legacy: Page object
Legacy->>Page: screenshot({ fullPage: false })
Page-->>Legacy: imageBuffer
Legacy->>LLM: createChatCompletion() with question + image + answer
else screenshot = false
Legacy->>LLM: createChatCompletion() with question + agentReasoning
end
LLM-->>Legacy: Parsed response (YES/NO + reasoning)
Legacy-->>Facade: EvaluationResult
Facade-->>Client: EvaluationResult
else backend = "verifier"
Facade->>Facade: Throw StagehandInvalidArgumentError
Facade-->>Client: Error: "verifier backend not available"
end
Note over Client,Page: batchAsk() - Legacy Backend Flow
Client->>Facade: batchAsk(options)
Facade->>Legacy: batchAsk(options)
Legacy->>Legacy: Validate questions array
alt screenshot = true
Legacy->>Page: awaitActivePage()
Page-->>Legacy: Page object
Legacy->>Page: screenshot()
Page-->>Legacy: imageBuffer
end
Legacy->>Legacy: Format questions into text
Legacy->>LLM: createChatCompletion() with formatted questions + screenshot
LLM-->>Legacy: Parsed batch response
Legacy-->>Facade: EvaluationResult[]
Facade-->>Client: EvaluationResult[]
Note over Client,Page: Error Handling - LLM Failure
Client->>Facade: ask() / batchAsk()
Facade->>Legacy: Delegate call
Legacy->>LLM: createChatCompletion()
alt LLM returns invalid data
LLM-->>Legacy: Malformed response
Legacy->>Legacy: Catch parsing error
Legacy-->>Facade: { evaluation: "INVALID", reasoning: error }
Facade-->>Client: Fallback result
else LLM client throws
LLM-->>Legacy: Error thrown
Legacy->>Legacy: Catch error
Legacy-->>Facade: { evaluation: "INVALID", reasoning: error }
Facade-->>Client: Fallback result
end
3 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
The verifier rewrite needs to coexist with the legacy v3 evaluator so existing evals and callers do not silently change behavior while the new pipeline is reviewed. This PR creates the compatibility boundary that lets us select the evaluator backend explicitly.
What Changed
LegacyV3Evaluator.V3Evaluatoras the public facade.STAGEHAND_EVALUATOR_BACKEND=legacy|verifierand constructor backend options.legacyto preserve currentask()andbatchAsk()semantics.Tests
pnpm --filter @browserbasehq/stagehand run typecheckpnpm --filter @browserbasehq/stagehand run test:core -- packages/core/dist/esm/tests/unit/public-api/v3-core.test.jsgit diff --check