Remove tsup, generate unified re-used ESM and CJS builds with sourcemaps and types, switch CI to new Evals CLI by pirate · Pull Request #1632 · browserbase/stagehand

pirate · 2026-01-28T19:23:23Z

why

what changed

test plan

Summary by cubic

Rebuilt the core into unified ESM and CJS builds with sourcemaps and types, and switched CI to the new Evals CLI running against the ESM dist. This simplifies consumption via export maps, improves test stability, and enhances coverage/reporting.

New Features
- Unified builds: ESM and CJS outputs with export map, sourcemaps, and types; removed tsup. Tests resolve @browserbasehq/stagehand to dist/esm via loader hooks; added build/test scripts for both.
- CI: Switched to the new Evals CLI; consolidated workflows; upload CTRF and V8 coverage artifacts; added coverage normalization and flaky test insights.
- Reliability: Weighted Browserbase region selection, Chromium launch preflight (CHROME_PATH, no-sandbox in CI), and env-based timeouts (LLM_MAX_MS, BROWSERBASE_*); tighter click/input dispatch and an env reporter.
- API surface: AISdkClient and CustomOpenAIClient moved to lib/v3 and exported publicly.
- Server: SEA build now driven by TS; replay schemas expanded; replay endpoint returns an empty success; improved error stacks.
Migration
- Replace TEST_ENV with STAGEHAND_BROWSER_TARGET=local|browserbase. Set CHROME_PATH in CI and BROWSERBASE_REGION_DISTRIBUTION/timeouts as needed.
- Environment variables are no longer loaded via dotenv; set them explicitly in your environment/CI.
- Update imports to use @browserbasehq/stagehand exports for AISdkClient and CustomOpenAIClient.
- If you referenced dist paths or relied on tsup, use the package export map instead (ESM import, CJS require).

^{Written for commit 6b9bb21. Summary will update on new commits. Review in cubic}

…fy chrome launches before running tests

changeset-bot · 2026-01-28T19:23:27Z

⚠️ No Changeset found

Latest commit: 6b9bb21

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

greptile-apps · 2026-01-28T19:23:41Z

Too many files changed for review. (101 files found, 100 file limit)

github-actions · 2026-01-28T19:26:13Z

✱ Stainless preview builds

This PR will update the stagehand SDKs with the following commit message.

feat: randomize region used for evals, split out pnpm and turbo cache, veri…

Edit this comment to update it. It will appear in the SDK's changelogs.

✅ stagehand-typescript studio · code · diff

Your SDK built successfully.
generate ✅ → build ✅ → lint ✅ → test ❗
npm install https://pkg.stainless.com/s/stagehand-typescript/8fa00c3ccdcc1787d2a0e948ae3417b6b7c8a952/dist.tar.gz
New diagnostics (2 note)

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

✅ stagehand-kotlin studio · code · diff

Your SDK built successfully.
generate ✅ → lint ✅ → test ✅

New diagnostics (2 note)

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

✅ stagehand-ruby studio · code · diff

Your SDK built successfully.
generate ✅ → lint ✅ → test ✅

New diagnostics (2 note)

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

✅ stagehand-php studio · code · diff

Your SDK built successfully.
generate ✅ → lint ✅ → test ✅

New diagnostics (2 note)

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

✅ stagehand-csharp studio · code · diff

Your SDK built successfully.
generate ⚠️ → lint ❗ → test ✅

New diagnostics (2 note)

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

✅ stagehand-python studio · code · diff

Your SDK built successfully.
generate ✅ → build ✅ → lint ❗ → test ❗
pip install https://pkg.stainless.com/s/stagehand-python/92a9d0407c194017261692a2505d7c559b7afb32/stagehand_alpha-3.4.7-py3-none-any.whl
New diagnostics (2 note)

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

✅ stagehand-openapi studio · code · diff

Your SDK built successfully.
generate ✅

New diagnostics (2 note)

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

✅ stagehand-go studio · code · diff

Your SDK built successfully.
generate ✅ → lint ✅ → test ✅
go get github.com/stainless-sdks/stagehand-go@a1b8973804961fd21f799b6b53442b478ae05c0e
New diagnostics (2 note)

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

✅ stagehand-java studio · conflict

Your SDK built successfully.

New diagnostics (2 note)

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-01-30 02:34:15 UTC

cubic-dev-ai

16 issues found across 101 files

Confidence score: 2/5

High-risk security issue: packages/server/src/lib/errorHandler.ts exposes stack traces/internal details to clients, which can leak sensitive implementation info.
Concrete behavior bugs in packages/core/lib/v3/external_clients/customOpenAI.ts (wrong inputSchema key and lost message roles) likely break tool calls and mis-handle system/assistant messages.
Multiple breaking API changes lack required integration tests (e.g., packages/core/lib/v3/types/public/api.ts, packages/server/src/lib/errorHandler.ts, packages/server/src/routes/v1/sessions/_id/replay.ts), increasing regression risk.
Pay close attention to packages/server/src/lib/errorHandler.ts, packages/core/lib/v3/external_clients/customOpenAI.ts, packages/core/lib/v3/types/public/api.ts, packages/server/src/routes/v1/sessions/_id/replay.ts - security exposure, API correctness, and untested breaking changes.

Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="packages/evals/scoring.ts">

<violation number="1" location="packages/evals/scoring.ts:12">
P2: Logging Error outputs now collapses to "{}" because JSON.stringify(Error) returns an empty object, so error details are lost compared to the prior String(Error) behavior. Consider detecting Error and logging its message/stack instead.</violation>
</file>

<file name="packages/server/scripts/build-sea.ts">

<violation number="1" location="packages/server/scripts/build-sea.ts:39">
P2: File descriptor leak: the `WriteStream` is not closed in error paths (redirects, non-200 status, network errors). This can exhaust file descriptors if the script encounters multiple failures or redirects.</violation>
</file>

<file name=".github/workflows/stagehand-server-sea-build.yml">

<violation number="1" location=".github/workflows/stagehand-server-sea-build.yml:50">
P2: The new composite setup action installs dependencies without `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD`, so Playwright browser binaries will download during CI (the previous workflow explicitly skipped this). If the SEA build doesn’t need browsers, this adds large unnecessary downloads and can slow or fail builds. Consider setting the env on the action step to preserve the previous behavior.</violation>
</file>

<file name="packages/server/src/lib/errorHandler.ts">

<violation number="1" location="packages/server/src/lib/errorHandler.ts:63">
P1: Exposing stack traces and internal error messages to clients is a security vulnerability. This can reveal internal file paths, library versions, and implementation details that help attackers. The error is already logged server-side via `request.log.error(err)`, so debugging capability is preserved. Return a generic message to clients instead.</violation>

<violation number="2" location="packages/server/src/lib/errorHandler.ts:63">
P1: Rule violated: **Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test**

Breaking change to error response format lacks integration test coverage. The `withErrorHandling` wrapper now exposes error stack traces to clients instead of a generic message. This changes the API response contract for 500 errors. Per the rule, this server behavior change should have an integration test in `packages/server/test` that verifies the error response format.</violation>
</file>

<file name="packages/core/vitest.config.ts">

<violation number="1" location="packages/core/vitest.config.ts:24">
P2: Bug: `String.replace()` replaces the first occurrence, not the file extension. If the extension string (e.g., `.ts`) appears in a directory name, the wrong part of the path will be replaced. Use `candidate.slice(0, -ext.length) + ".js"` to ensure only the actual extension at the end is replaced.</violation>
</file>

<file name=".github/actions/verify-chromium-launch/action.yml">

<violation number="1" location=".github/actions/verify-chromium-launch/action.yml:100">
P2: Duplicate Chrome flag `--disable-component-update` - this flag already appears earlier in the args array (line 69). Remove the duplicate to improve maintainability.</violation>
</file>

<file name="packages/core/lib/v3/external_clients/customOpenAI.ts">

<violation number="1" location="packages/core/lib/v3/external_clients/customOpenAI.ts:71">
P2: Duplicate warning for image/vision support. The `image` variable was already destructured from `options` and checked at line 44-48. This second check is redundant.</violation>

<violation number="2" location="packages/core/lib/v3/external_clients/customOpenAI.ts:137">
P1: Message role is lost when content is a string. The original `message.role` is ignored and always set to `"user"`, which would incorrectly convert system or assistant messages.</violation>

<violation number="3" location="packages/core/lib/v3/external_clients/customOpenAI.ts:186">
P1: Incorrect property name `inputSchema` used instead of `parameters` for OpenAI tool function definition. The OpenAI API expects `parameters`, not `inputSchema`. This will cause tool calls to fail.</violation>
</file>

<file name=".github/actions/select-browserbase-region/action.yml">

<violation number="1" location=".github/actions/select-browserbase-region/action.yml:60">
P2: Validate/sanitize region values before writing to $GITHUB_OUTPUT/$GITHUB_ENV to prevent newline injection. At minimum, reject regions containing newline or characters outside an allowlist (e.g., /^[A-Za-z0-9-]+$/) before setting outputs/env.</violation>
</file>

<file name=".github/actions/publish-ctrf-report/action.yml">

<violation number="1" location=".github/actions/publish-ctrf-report/action.yml:12">
P3: `github-token` is required but never used, forcing callers to provide a meaningless secret. Remove the input or make it optional and wire it into a step if needed.</violation>

<violation number="2" location=".github/actions/publish-ctrf-report/action.yml:71">
P2: The CTRF report check uses `-f` on a quoted glob, so it never detects default `./ctrf/*.json` matches and skips uploads. Use a glob-aware check (e.g., `compgen -G`) to detect any matching files.</violation>
</file>

<file name=".github/workflows/stagehand-server-release.yml">

<violation number="1" location=".github/workflows/stagehand-server-release.yml:34">
P2: The new composite action no longer sets `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD`, so `pnpm install` will download Playwright browsers during this workflow. That adds avoidable time/bandwidth to the release job; keep the env var on this step.</violation>
</file>

<file name="packages/core/lib/v3/types/public/api.ts">

<violation number="1" location="packages/core/lib/v3/types/public/api.ts:828">
P1: Rule violated: **Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test**

Breaking changes to Replay API schemas (`TokenUsageSchema`, `ReplayActionSchema`, `ReplayPageSchema`, `ReplayResultSchema`) are not covered by integration tests. The `packages/server/test/integration/v3/` directory has no `replay.test.ts` file. Per the rule, any breaking changes to shared request/response shapes in `packages/core/**/types/public/api.ts` must be covered by integration tests under `packages/server/test`.</violation>
</file>

<file name="packages/server/src/routes/v1/sessions/_id/replay.ts">

<violation number="1" location="packages/server/src/routes/v1/sessions/_id/replay.ts:22">
P1: Rule violated: **Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test**

The replay endpoint now returns a 200 success response (empty replay) instead of the previous 501 Not Implemented, which is a breaking change to the Stagehand REST API. The rule "Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test" is violated because there is no integration test for /sessions/:id/replay in packages/server/test.</violation>
</file>

Architecture diagram

sequenceDiagram
    participant CI as GitHub Actions
    participant RS as Region Selector
    participant CORE as Stagehand Core (SDK)
    participant BB as Browserbase API
    participant CH as Chromium (Local)
    participant LLM as LLM Client (OpenAI/Anthropic)
    participant REP as CTRF & V8 Reporter

    Note over CI, REP: CI Initialization & Environment Setup
    CI->>RS: NEW: selectRegion(distribution_weights)
    RS-->>CI: return BROWSERBASE_REGION

    CI->>CH: NEW: verifyChromiumLaunch()
    CH-->>CI: confirm CDP connection success

    Note over CI, REP: Runtime Execution (Evals/Tests)
    CI->>CORE: Run Eval Task / Test
    CORE->>CORE: NEW: setEnvTimeouts(LLM_MAX_MS, BB_CREATE_MS)
    
    CORE->>BB: createBrowserbaseSession(region)
    Note right of CORE: CHANGED: Wrapped in withTimeout
    BB-->>CORE: session_id, connect_url

    CORE->>LLM: createChatCompletion()
    Note right of CORE: CHANGED: Moved from examples to core API<br/>NEW: withLlmTimeout execution
    LLM-->>CORE: response data + usage

    Note over CORE, BB: Interaction Flow
    CORE->>BB: Locator.click() / Page.click()
    Note right of CORE: CHANGED: Parallelized input dispatch<br/>(mousePressed + mouseReleased)
    BB-->>CORE: interaction success

    Note over CI, REP: Post-Execution & Reporting
    CORE->>BB: NEW: endBrowserbaseSession()
    Note right of CORE: Best-effort Browser.close cleanup

    CI->>REP: NEW: publish-ctrf-report
    Note right of REP: CHANGED: JUnit to CTRF conversion
    CI->>REP: NEW: upload-v8-coverage
    REP-->>CI: Artifacts stored (Stability & Visibility)

    Note over CI, CORE: Server Replay Flow (Optional)
    CI->>CORE: GET /v1/sessions/:id/replay
    CORE-->>CI: CHANGED: Returns ReplayResult (updated schema with metadata)

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-01-28T19:31:38Z

packages/server/src/lib/errorHandler.ts

+      const errMessage =
+        err instanceof Error ? (err.stack ?? err.message) : String(err);
+      return error(reply, errMessage, StatusCodes.INTERNAL_SERVER_ERROR);


P1: Exposing stack traces and internal error messages to clients is a security vulnerability. This can reveal internal file paths, library versions, and implementation details that help attackers. The error is already logged server-side via request.log.error(err), so debugging capability is preserved. Return a generic message to clients instead.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/server/src/lib/errorHandler.ts, line 63: <comment>Exposing stack traces and internal error messages to clients is a security vulnerability. This can reveal internal file paths, library versions, and implementation details that help attackers. The error is already logged server-side via `request.log.error(err)`, so debugging capability is preserved. Return a generic message to clients instead.</comment> <file context> @@ -60,11 +60,9 @@ export function withErrorHandling< - "An unexpected error occurred", - StatusCodes.INTERNAL_SERVER_ERROR, - ); + const errMessage = + err instanceof Error ? (err.stack ?? err.message) : String(err); + return error(reply, errMessage, StatusCodes.INTERNAL_SERVER_ERROR); </file context>

Suggested change

const errMessage =

err instanceof Error ? (err.stack ?? err.message) : String(err);

return error(reply, errMessage, StatusCodes.INTERNAL_SERVER_ERROR);

return error(

reply,

"An unexpected error occurred",

StatusCodes.INTERNAL_SERVER_ERROR,

);

cubic-dev-ai · 2026-01-28T19:31:38Z

packages/core/lib/v3/external_clients/customOpenAI.ts

+            };
+            return formattedMessage;
+          } else if (message.role === "user") {
+            const formattedMessage: ChatCompletionUserMessageParam = {


P1: Message role is lost when content is a string. The original message.role is ignored and always set to "user", which would incorrectly convert system or assistant messages.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/external_clients/customOpenAI.ts, line 137: <comment>Message role is lost when content is a string. The original `message.role` is ignored and always set to `"user"`, which would incorrectly convert system or assistant messages.</comment> <file context> @@ -0,0 +1,280 @@ + }; + return formattedMessage; + } else if (message.role === "user") { + const formattedMessage: ChatCompletionUserMessageParam = { + ...message, + role: "user", </file context>

cubic-dev-ai · 2026-01-28T19:31:38Z

packages/core/lib/v3/external_clients/customOpenAI.ts

+        function: {
+          name: tool.name,
+          description: tool.description,
+          inputSchema: tool.parameters,


P1: Incorrect property name inputSchema used instead of parameters for OpenAI tool function definition. The OpenAI API expects parameters, not inputSchema. This will cause tool calls to fail.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/external_clients/customOpenAI.ts, line 186: <comment>Incorrect property name `inputSchema` used instead of `parameters` for OpenAI tool function definition. The OpenAI API expects `parameters`, not `inputSchema`. This will cause tool calls to fail.</comment> <file context> @@ -0,0 +1,280 @@ + function: { + name: tool.name, + description: tool.description, + inputSchema: tool.parameters, + }, + type: "function", </file context>

cubic-dev-ai · 2026-01-28T19:31:38Z

packages/core/lib/v3/types/public/api.ts

 export const ReplayResultSchema = z
  .object({
-    pages: z.array(ReplayPageSchema).optional(),
+    pages: z.array(ReplayPageSchema),


P1: Rule violated: Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test

Breaking changes to Replay API schemas (TokenUsageSchema, ReplayActionSchema, ReplayPageSchema, ReplayResultSchema) are not covered by integration tests. The packages/server/test/integration/v3/ directory has no replay.test.ts file. Per the rule, any breaking changes to shared request/response shapes in packages/core/**/types/public/api.ts must be covered by integration tests under packages/server/test.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/types/public/api.ts, line 828: <comment>Breaking changes to Replay API schemas (`TokenUsageSchema`, `ReplayActionSchema`, `ReplayPageSchema`, `ReplayResultSchema`) are not covered by integration tests. The `packages/server/test/integration/v3/` directory has no `replay.test.ts` file. Per the rule, any breaking changes to shared request/response shapes in `packages/core/**/types/public/api.ts` must be covered by integration tests under `packages/server/test`.</comment> <file context> @@ -795,31 +795,38 @@ export const TokenUsageSchema = z export const ReplayResultSchema = z .object({ - pages: z.array(ReplayPageSchema).optional(), + pages: z.array(ReplayPageSchema), + clientLanguage: z.string().optional(), }) </file context>

cubic-dev-ai · 2026-01-28T19:31:38Z

packages/server/src/lib/errorHandler.ts

-        "An unexpected error occurred",
-        StatusCodes.INTERNAL_SERVER_ERROR,
-      );
+      const errMessage =


P1: Rule violated: Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test

Breaking change to error response format lacks integration test coverage. The withErrorHandling wrapper now exposes error stack traces to clients instead of a generic message. This changes the API response contract for 500 errors. Per the rule, this server behavior change should have an integration test in packages/server/test that verifies the error response format.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/server/src/lib/errorHandler.ts, line 63: <comment>Breaking change to error response format lacks integration test coverage. The `withErrorHandling` wrapper now exposes error stack traces to clients instead of a generic message. This changes the API response contract for 500 errors. Per the rule, this server behavior change should have an integration test in `packages/server/test` that verifies the error response format.</comment> <file context> @@ -60,11 +60,9 @@ export function withErrorHandling< - "An unexpected error occurred", - StatusCodes.INTERNAL_SERVER_ERROR, - ); + const errMessage = + err instanceof Error ? (err.stack ?? err.message) : String(err); + return error(reply, errMessage, StatusCodes.INTERNAL_SERVER_ERROR); </file context>

cubic-dev-ai · 2026-01-28T19:31:39Z

packages/core/lib/v3/external_clients/customOpenAI.ts

+      },
+    });
+
+    if (options.image) {


P2: Duplicate warning for image/vision support. The image variable was already destructured from options and checked at line 44-48. This second check is redundant.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/external_clients/customOpenAI.ts, line 71: <comment>Duplicate warning for image/vision support. The `image` variable was already destructured from `options` and checked at line 44-48. This second check is redundant.</comment> <file context> @@ -0,0 +1,280 @@ + }, + }); + + if (options.image) { + console.warn( + "Image provided. Vision is not currently supported for openai", </file context>

.github/actions/select-browserbase-region/action.yml

+          exit 1
+        fi
+        echo "Selected Browserbase region: $chosen"
+        echo "region=$chosen" >> "$GITHUB_OUTPUT"


.github/actions/publish-ctrf-report/action.yml

cubic-dev-ai · 2026-01-28T19:31:39Z

.github/workflows/stagehand-server-release.yml

+      - uses: ./.github/actions/setup-node-pnpm-turbo
        with:
-          node-version: 22.x
-          cache: 'pnpm'
-          cache-dependency-path: '**/pnpm-lock.yaml'
-
-      - name: Install dependencies
-        env:
-          PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1"
-        run: pnpm install --frozen-lockfile
+          use-prebuilt-artifacts: "false"


P2: The new composite action no longer sets PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD, so pnpm install will download Playwright browsers during this workflow. That adds avoidable time/bandwidth to the release job; keep the env var on this step.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/stagehand-server-release.yml, line 34: <comment>The new composite action no longer sets `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD`, so `pnpm install` will download Playwright browsers during this workflow. That adds avoidable time/bandwidth to the release job; keep the env var on this step.</comment> <file context> @@ -31,19 +31,9 @@ jobs: - - - name: Setup Node.js - uses: actions/setup-node@v6 + - uses: ./.github/actions/setup-node-pnpm-turbo with: - node-version: 22.x </file context>

Suggested change

- uses: ./.github/actions/setup-node-pnpm-turbo

with:

node-version: 22.x

cache: 'pnpm'

cache-dependency-path: '**/pnpm-lock.yaml'

- name: Install dependencies

env:

PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1"

run: pnpm install --frozen-lockfile

use-prebuilt-artifacts: "false"

- uses: ./.github/actions/setup-node-pnpm-turbo

env:

PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1"

with:

use-prebuilt-artifacts: "false"

.github/actions/publish-ctrf-report/action.yml

…om ESM dist

cubic-dev-ai

6 issues found across 67 files (changes from recent commits).

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="packages/core/scripts/normalize-v8-coverage.ts">

<violation number="1" location="packages/core/scripts/normalize-v8-coverage.ts:267">
P2: SourceMapConsumer cleanup not in a `finally` block. If an error occurs during processing, the `consumer.destroy()` calls on lines 269-271 will be skipped, causing memory leaks. Wrap the processing logic in try/finally to ensure cleanup.</violation>

<violation number="2" location="packages/core/scripts/normalize-v8-coverage.ts:296">
P2: Unhandled promise rejection in main entry point. The `void main()` call discards the promise without handling potential errors from `normalizeV8Coverage`. Add a `.catch()` handler to log errors and exit with a non-zero code.</violation>
</file>

<file name="packages/core/package.json">

<violation number="1" location="packages/core/package.json:29">
P1: Scripts `e2e:local` and `e2e:bb` reference non-existent `test:e2e:local` and `test:e2e:bb` scripts in this package. These either need to be defined locally with the appropriate `STAGEHAND_BROWSER_TARGET` env var, or the alias scripts should set the env var directly.</violation>
</file>

<file name="packages/server/scripts/test-server.ts">

<violation number="1" location="packages/server/scripts/test-server.ts:89">
P2: The `?? process.env.STAGEHAND_API_URL` fallback is dead code since `baseUrl` is always defined (never nullish). If the intent is to always overwrite, remove the fallback. If the intent is to preserve an existing `STAGEHAND_API_URL`, this logic is incorrect.</violation>
</file>

<file name="packages/core/scripts/coverage.ts">

<violation number="1" location="packages/core/scripts/coverage.ts:93">
P2: Missing error check for spawn failure. If `spawnSync` fails (e.g., `pnpm` or `c8` not found), `result.error` will be set but ignored, and the script exits silently with code 1. Add error handling before processing stdout/stderr to provide diagnostic output.</violation>
</file>

<file name="packages/evals/scripts/test-evals.ts">

<violation number="1" location="packages/evals/scripts/test-evals.ts:27">
P2: Missing error handling around `JSON.parse`. If `eval-summary.json` exists but contains malformed JSON, the script will crash without generating the CTRF report, potentially breaking CI silently. Consider wrapping the JSON parsing in a try/catch and falling back to the missing report case on parse errors.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

packages/core/package.json

cubic-dev-ai · 2026-01-30T01:14:13Z

packages/core/scripts/normalize-v8-coverage.ts

+};
+
+if (import.meta.url === `file://${process.argv[1]}`) {
+  void main();


P2: Unhandled promise rejection in main entry point. The void main() call discards the promise without handling potential errors from normalizeV8Coverage. Add a .catch() handler to log errors and exit with a non-zero code.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/scripts/normalize-v8-coverage.ts, line 296: <comment>Unhandled promise rejection in main entry point. The `void main()` call discards the promise without handling potential errors from `normalizeV8Coverage`. Add a `.catch()` handler to log errors and exit with a non-zero code.</comment> <file context> @@ -0,0 +1,297 @@ +}; + +if (import.meta.url === `file://${process.argv[1]}`) { + void main(); +} </file context>

cubic-dev-ai · 2026-01-30T01:14:13Z

packages/core/scripts/normalize-v8-coverage.ts

+    }
+  }
+
+  for (const ctx of sourceCache.values()) {


P2: SourceMapConsumer cleanup not in a finally block. If an error occurs during processing, the consumer.destroy() calls on lines 269-271 will be skipped, causing memory leaks. Wrap the processing logic in try/finally to ensure cleanup.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/scripts/normalize-v8-coverage.ts, line 267: <comment>SourceMapConsumer cleanup not in a `finally` block. If an error occurs during processing, the `consumer.destroy()` calls on lines 269-271 will be skipped, causing memory leaks. Wrap the processing logic in try/finally to ensure cleanup.</comment> <file context> @@ -0,0 +1,297 @@ + } + } + + for (const ctx of sourceCache.values()) { + ctx?.consumer.destroy(); + } </file context>

cubic-dev-ai · 2026-01-30T01:14:13Z

packages/server/scripts/test-server.ts

+  parsedBaseUrl.port || (parsedBaseUrl.protocol === "https:" ? "443" : "80");
+
+process.env.PORT = port;
+process.env.STAGEHAND_API_URL = baseUrl ?? process.env.STAGEHAND_API_URL;


P2: The ?? process.env.STAGEHAND_API_URL fallback is dead code since baseUrl is always defined (never nullish). If the intent is to always overwrite, remove the fallback. If the intent is to preserve an existing STAGEHAND_API_URL, this logic is incorrect.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/server/scripts/test-server.ts, line 89: <comment>The `?? process.env.STAGEHAND_API_URL` fallback is dead code since `baseUrl` is always defined (never nullish). If the intent is to always overwrite, remove the fallback. If the intent is to preserve an existing `STAGEHAND_API_URL`, this logic is incorrect.</comment> <file context> @@ -0,0 +1,326 @@ + parsedBaseUrl.port || (parsedBaseUrl.protocol === "https:" ? "443" : "80"); + +process.env.PORT = port; +process.env.STAGEHAND_API_URL = baseUrl ?? process.env.STAGEHAND_API_URL; +process.env.BB_ENV = process.env.BB_ENV ?? "local"; + </file context>

cubic-dev-ai · 2026-01-30T01:14:13Z

packages/core/scripts/coverage.ts

+  },
+);
+
+if (result.stdout) {


P2: Missing error check for spawn failure. If spawnSync fails (e.g., pnpm or c8 not found), result.error will be set but ignored, and the script exits silently with code 1. Add error handling before processing stdout/stderr to provide diagnostic output.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/scripts/coverage.ts, line 93: <comment>Missing error check for spawn failure. If `spawnSync` fails (e.g., `pnpm` or `c8` not found), `result.error` will be set but ignored, and the script exits silently with code 1. Add error handling before processing stdout/stderr to provide diagnostic output.</comment> <file context> @@ -0,0 +1,100 @@ + }, +); + +if (result.stdout) { + process.stdout.write(result.stdout); + fs.writeFileSync(path.join(outDir, "coverage-summary.txt"), result.stdout); </file context>

cubic-dev-ai · 2026-01-30T01:14:13Z

packages/evals/scripts/test-evals.ts

+) => {
+  const timestamp = new Date().toISOString();
+  if (fs.existsSync(summaryPath)) {
+    const summary = JSON.parse(fs.readFileSync(summaryPath, "utf8")) as {


P2: Missing error handling around JSON.parse. If eval-summary.json exists but contains malformed JSON, the script will crash without generating the CTRF report, potentially breaking CI silently. Consider wrapping the JSON parsing in a try/catch and falling back to the missing report case on parse errors.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/evals/scripts/test-evals.ts, line 27: <comment>Missing error handling around `JSON.parse`. If `eval-summary.json` exists but contains malformed JSON, the script will crash without generating the CTRF report, potentially breaking CI silently. Consider wrapping the JSON parsing in a try/catch and falling back to the missing report case on parse errors.</comment> <file context> @@ -0,0 +1,179 @@ +) => { + const timestamp = new Date().toISOString(); + if (fs.existsSync(summaryPath)) { + const summary = JSON.parse(fs.readFileSync(summaryPath, "utf8")) as { + passed?: Array<{ eval: string; model: string; categories?: string[] }>; + failed?: Array<{ eval: string; model: string; categories?: string[] }>; </file context>

pirate added 30 commits January 26, 2026 13:09

randomize region used for evals, split out pnpm and turbo cache, veri…

6c906c8

…fy chrome launches before running tests

use github actions native chromium for integration tests

4ccf598

restore playwright browser usage for integration tests

20b6745

add code coverage and flaky test reporting

e07bd93

fix lint

e4e2d63

make sure ctrf artifacts are unique

ecc132d

add missing packages and fix ctrf summaries

2992f77

fix sanitization of vitest artifacts

df3708c

remove custom action for vitest sanitization

68d79c2

only merge coverage in final step

5001deb

check ratelimits

543bf42

log if ratelimits

1faba4f

limit github api calls

e632cb6

fix e2e bb tests running on local and coverage reporting

982699f

fix chromium path used by integration tests

df27619

make sure we disable sandbox in e2e local tests

fd2385a

tune coverage reports

4a1f543

fix chromium version used

524f87e

bump test timeouts to improve flakyness when testing against remote

8ef22d1

fix integration test failures

538081b

fix flushing of server test output

a2c8d78

allow playwright downloads in server integration tests

3002cb7

increase screenshot timeout on prod bb browser

80e6e8a

force disableAPI true in all e2e tests

0efeb40

rename TEST_ENV to STAGEHAND_ENV

da42c9b

up timetouts in screenshot test to 5s

e5a62fc

improve env reporter in tests

ad422cb

lint

a354cee

fix start integration tests not using CHROME_PATH

d60f86a

use non-pretty logs for SEA in CI

bc24428

pirate added 6 commits January 27, 2026 17:51

server: compile integration tests to ESM

af6eb65

ci: run compiled ESM tests and evals

0b95da1

core: map package import to test entry

8e831ca

format: align new ESM paths

e85f5dc

fix esm e2e filters and eval loader

4abfd32

fix replay endpoint behavior

61a6a06

pirate and others added 2 commits January 28, 2026 11:25

Merge branch 'main' into esm-build

31321b8

fix failing stagehand server release

4bd2b8b

cubic-dev-ai bot reviewed Jan 28, 2026

View reviewed changes

pirate added 10 commits January 28, 2026 11:35

use build artifacts for server tests

38d767f

re-use pre-built artifacts

67d0975

fix lint

59a0bdf

fix missing prebuild: false

44066fa

fix server-integration-tests needs to depend on prebuild

5036d9f

avoid experimental test coverage in server integration

c27f26b

ignore dbus and PHONE_REGISTRATION_ERROR lines in chrome stderr output

ce92cf7

increase session creation timeout for bb

a3be6b9

dont forget to upload sea sourcemapped version as artifact

f21743f

build everything to ESM with consistent sourcemaps, then build SEA Fr…

ece5638

…om ESM dist

pirate changed the title ~~randomize region used for evals, split out pnpm and turbo cache, veri…~~ Remove tsup, generate unified re-used ESM and CJS builds with sourcemaps and types, switch CI to new Evals CLI Jan 29, 2026

pirate mentioned this pull request Jan 29, 2026

Replace tsup with esbuild and precompile deterministic Playwright suite #1095

Closed

pirate added 2 commits January 29, 2026 17:09

use new evals cli, unify on esm main build

dc4e40c

remove dotenv support completely for security

df640db

cubic-dev-ai bot reviewed Jan 30, 2026

View reviewed changes

pirate added 3 commits January 29, 2026 18:15

remove dotenv loading and move env to turbo

faf5c32

add missing env vars to turbo

59d721a

only use sea binary name

6b9bb21

Conversation

pirate commented Jan 28, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

why

what changed

test plan

Summary by cubic

Uh oh!

changeset-bot bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

greptile-apps bot commented Jan 28, 2026

Uh oh!

github-actions bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✱ Stainless preview builds

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

cubic-dev-ai bot Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai bot Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai bot Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

pirate commented Jan 28, 2026 •

edited by cubic-dev-ai bot

Loading

changeset-bot bot commented Jan 28, 2026 •

edited

Loading

github-actions bot commented Jan 28, 2026 •

edited

Loading

cubic-dev-ai bot Jan 28, 2026 •

edited

Loading

cubic-dev-ai bot Jan 28, 2026 •

edited

Loading

cubic-dev-ai bot Jan 28, 2026 •

edited

Loading

cubic-dev-ai bot Jan 28, 2026 •

edited

Loading

cubic-dev-ai bot Jan 28, 2026 •

edited

Loading

cubic-dev-ai bot Jan 28, 2026 •

edited

Loading

cubic-dev-ai bot Jan 28, 2026 •

edited

Loading

cubic-dev-ai bot Jan 30, 2026 •

edited

Loading

cubic-dev-ai bot Jan 30, 2026 •

edited

Loading

cubic-dev-ai bot Jan 30, 2026 •

edited

Loading

cubic-dev-ai bot Jan 30, 2026 •

edited

Loading

cubic-dev-ai bot Jan 30, 2026 •

edited

Loading