Skip to content

Remove tsup, generate unified re-used ESM and CJS builds with sourcemaps and types, switch CI to new Evals CLI#1632

Open
pirate wants to merge 69 commits intomainfrom
esm-build
Open

Remove tsup, generate unified re-used ESM and CJS builds with sourcemaps and types, switch CI to new Evals CLI#1632
pirate wants to merge 69 commits intomainfrom
esm-build

Conversation

@pirate
Copy link
Member

@pirate pirate commented Jan 28, 2026

why

what changed

test plan


Summary by cubic

Rebuilt the core into unified ESM and CJS builds with sourcemaps and types, and switched CI to the new Evals CLI running against the ESM dist. This simplifies consumption via export maps, improves test stability, and enhances coverage/reporting.

  • New Features

    • Unified builds: ESM and CJS outputs with export map, sourcemaps, and types; removed tsup. Tests resolve @browserbasehq/stagehand to dist/esm via loader hooks; added build/test scripts for both.
    • CI: Switched to the new Evals CLI; consolidated workflows; upload CTRF and V8 coverage artifacts; added coverage normalization and flaky test insights.
    • Reliability: Weighted Browserbase region selection, Chromium launch preflight (CHROME_PATH, no-sandbox in CI), and env-based timeouts (LLM_MAX_MS, BROWSERBASE_*); tighter click/input dispatch and an env reporter.
    • API surface: AISdkClient and CustomOpenAIClient moved to lib/v3 and exported publicly.
    • Server: SEA build now driven by TS; replay schemas expanded; replay endpoint returns an empty success; improved error stacks.
  • Migration

    • Replace TEST_ENV with STAGEHAND_BROWSER_TARGET=local|browserbase. Set CHROME_PATH in CI and BROWSERBASE_REGION_DISTRIBUTION/timeouts as needed.
    • Environment variables are no longer loaded via dotenv; set them explicitly in your environment/CI.
    • Update imports to use @browserbasehq/stagehand exports for AISdkClient and CustomOpenAIClient.
    • If you referenced dist paths or relied on tsup, use the package export map instead (ESM import, CJS require).

Written for commit 6b9bb21. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link

changeset-bot bot commented Jan 28, 2026

⚠️ No Changeset found

Latest commit: 6b9bb21

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 28, 2026

Too many files changed for review. (101 files found, 100 file limit)

@github-actions
Copy link
Contributor

github-actions bot commented Jan 28, 2026

✱ Stainless preview builds

This PR will update the stagehand SDKs with the following commit message.

feat: randomize region used for evals, split out pnpm and turbo cache, veri…

Edit this comment to update it. It will appear in the SDK's changelogs.

stagehand-typescript studio · code · diff

Your SDK built successfully.
generate ✅build ✅lint ✅test ❗

npm install https://pkg.stainless.com/s/stagehand-typescript/8fa00c3ccdcc1787d2a0e948ae3417b6b7c8a952/dist.tar.gz
New diagnostics (2 note)
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
stagehand-kotlin studio · code · diff

Your SDK built successfully.
generate ✅lint ✅test ✅

New diagnostics (2 note)
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
stagehand-ruby studio · code · diff

Your SDK built successfully.
generate ✅lint ✅test ✅

New diagnostics (2 note)
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
stagehand-php studio · code · diff

Your SDK built successfully.
generate ✅lint ✅test ✅

New diagnostics (2 note)
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
stagehand-csharp studio · code · diff

Your SDK built successfully.
generate ⚠️lint ❗test ✅

New diagnostics (2 note)
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
stagehand-python studio · code · diff

Your SDK built successfully.
generate ✅build ✅lint ❗test ❗

pip install https://pkg.stainless.com/s/stagehand-python/92a9d0407c194017261692a2505d7c559b7afb32/stagehand_alpha-3.4.7-py3-none-any.whl
New diagnostics (2 note)
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
stagehand-openapi studio · code · diff

Your SDK built successfully.
generate ✅

New diagnostics (2 note)
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
stagehand-go studio · code · diff

Your SDK built successfully.
generate ✅lint ✅test ✅

go get github.com/stainless-sdks/stagehand-go@a1b8973804961fd21f799b6b53442b478ae05c0e
New diagnostics (2 note)
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
stagehand-java studio · conflict

Your SDK built successfully.

New diagnostics (2 note)
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.
💡 Schema/IsAmbiguous: This schema does not have at least one of `type`, `oneOf`, `anyOf`, or `allOf`, so its type has been interpreted as `unknown`.

This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push.
If you push custom code to the preview branch, re-run this workflow to update the comment.
Last updated: 2026-01-30 02:34:15 UTC

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 issues found across 101 files

Confidence score: 2/5

  • High-risk security issue: packages/server/src/lib/errorHandler.ts exposes stack traces/internal details to clients, which can leak sensitive implementation info.
  • Concrete behavior bugs in packages/core/lib/v3/external_clients/customOpenAI.ts (wrong inputSchema key and lost message roles) likely break tool calls and mis-handle system/assistant messages.
  • Multiple breaking API changes lack required integration tests (e.g., packages/core/lib/v3/types/public/api.ts, packages/server/src/lib/errorHandler.ts, packages/server/src/routes/v1/sessions/_id/replay.ts), increasing regression risk.
  • Pay close attention to packages/server/src/lib/errorHandler.ts, packages/core/lib/v3/external_clients/customOpenAI.ts, packages/core/lib/v3/types/public/api.ts, packages/server/src/routes/v1/sessions/_id/replay.ts - security exposure, API correctness, and untested breaking changes.

Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="packages/evals/scoring.ts">

<violation number="1" location="packages/evals/scoring.ts:12">
P2: Logging Error outputs now collapses to "{}" because JSON.stringify(Error) returns an empty object, so error details are lost compared to the prior String(Error) behavior. Consider detecting Error and logging its message/stack instead.</violation>
</file>

<file name="packages/server/scripts/build-sea.ts">

<violation number="1" location="packages/server/scripts/build-sea.ts:39">
P2: File descriptor leak: the `WriteStream` is not closed in error paths (redirects, non-200 status, network errors). This can exhaust file descriptors if the script encounters multiple failures or redirects.</violation>
</file>

<file name=".github/workflows/stagehand-server-sea-build.yml">

<violation number="1" location=".github/workflows/stagehand-server-sea-build.yml:50">
P2: The new composite setup action installs dependencies without `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD`, so Playwright browser binaries will download during CI (the previous workflow explicitly skipped this). If the SEA build doesn’t need browsers, this adds large unnecessary downloads and can slow or fail builds. Consider setting the env on the action step to preserve the previous behavior.</violation>
</file>

<file name="packages/server/src/lib/errorHandler.ts">

<violation number="1" location="packages/server/src/lib/errorHandler.ts:63">
P1: Exposing stack traces and internal error messages to clients is a security vulnerability. This can reveal internal file paths, library versions, and implementation details that help attackers. The error is already logged server-side via `request.log.error(err)`, so debugging capability is preserved. Return a generic message to clients instead.</violation>

<violation number="2" location="packages/server/src/lib/errorHandler.ts:63">
P1: Rule violated: **Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test**

Breaking change to error response format lacks integration test coverage. The `withErrorHandling` wrapper now exposes error stack traces to clients instead of a generic message. This changes the API response contract for 500 errors. Per the rule, this server behavior change should have an integration test in `packages/server/test` that verifies the error response format.</violation>
</file>

<file name="packages/core/vitest.config.ts">

<violation number="1" location="packages/core/vitest.config.ts:24">
P2: Bug: `String.replace()` replaces the first occurrence, not the file extension. If the extension string (e.g., `.ts`) appears in a directory name, the wrong part of the path will be replaced. Use `candidate.slice(0, -ext.length) + ".js"` to ensure only the actual extension at the end is replaced.</violation>
</file>

<file name=".github/actions/verify-chromium-launch/action.yml">

<violation number="1" location=".github/actions/verify-chromium-launch/action.yml:100">
P2: Duplicate Chrome flag `--disable-component-update` - this flag already appears earlier in the args array (line 69). Remove the duplicate to improve maintainability.</violation>
</file>

<file name="packages/core/lib/v3/external_clients/customOpenAI.ts">

<violation number="1" location="packages/core/lib/v3/external_clients/customOpenAI.ts:71">
P2: Duplicate warning for image/vision support. The `image` variable was already destructured from `options` and checked at line 44-48. This second check is redundant.</violation>

<violation number="2" location="packages/core/lib/v3/external_clients/customOpenAI.ts:137">
P1: Message role is lost when content is a string. The original `message.role` is ignored and always set to `"user"`, which would incorrectly convert system or assistant messages.</violation>

<violation number="3" location="packages/core/lib/v3/external_clients/customOpenAI.ts:186">
P1: Incorrect property name `inputSchema` used instead of `parameters` for OpenAI tool function definition. The OpenAI API expects `parameters`, not `inputSchema`. This will cause tool calls to fail.</violation>
</file>

<file name=".github/actions/select-browserbase-region/action.yml">

<violation number="1" location=".github/actions/select-browserbase-region/action.yml:60">
P2: Validate/sanitize region values before writing to $GITHUB_OUTPUT/$GITHUB_ENV to prevent newline injection. At minimum, reject regions containing newline or characters outside an allowlist (e.g., /^[A-Za-z0-9-]+$/) before setting outputs/env.</violation>
</file>

<file name=".github/actions/publish-ctrf-report/action.yml">

<violation number="1" location=".github/actions/publish-ctrf-report/action.yml:12">
P3: `github-token` is required but never used, forcing callers to provide a meaningless secret. Remove the input or make it optional and wire it into a step if needed.</violation>

<violation number="2" location=".github/actions/publish-ctrf-report/action.yml:71">
P2: The CTRF report check uses `-f` on a quoted glob, so it never detects default `./ctrf/*.json` matches and skips uploads. Use a glob-aware check (e.g., `compgen -G`) to detect any matching files.</violation>
</file>

<file name=".github/workflows/stagehand-server-release.yml">

<violation number="1" location=".github/workflows/stagehand-server-release.yml:34">
P2: The new composite action no longer sets `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD`, so `pnpm install` will download Playwright browsers during this workflow. That adds avoidable time/bandwidth to the release job; keep the env var on this step.</violation>
</file>

<file name="packages/core/lib/v3/types/public/api.ts">

<violation number="1" location="packages/core/lib/v3/types/public/api.ts:828">
P1: Rule violated: **Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test**

Breaking changes to Replay API schemas (`TokenUsageSchema`, `ReplayActionSchema`, `ReplayPageSchema`, `ReplayResultSchema`) are not covered by integration tests. The `packages/server/test/integration/v3/` directory has no `replay.test.ts` file. Per the rule, any breaking changes to shared request/response shapes in `packages/core/**/types/public/api.ts` must be covered by integration tests under `packages/server/test`.</violation>
</file>

<file name="packages/server/src/routes/v1/sessions/_id/replay.ts">

<violation number="1" location="packages/server/src/routes/v1/sessions/_id/replay.ts:22">
P1: Rule violated: **Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test**

The replay endpoint now returns a 200 success response (empty replay) instead of the previous 501 Not Implemented, which is a breaking change to the Stagehand REST API. The rule "Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test" is violated because there is no integration test for /sessions/:id/replay in packages/server/test.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant CI as GitHub Actions
    participant RS as Region Selector
    participant CORE as Stagehand Core (SDK)
    participant BB as Browserbase API
    participant CH as Chromium (Local)
    participant LLM as LLM Client (OpenAI/Anthropic)
    participant REP as CTRF & V8 Reporter

    Note over CI, REP: CI Initialization & Environment Setup
    CI->>RS: NEW: selectRegion(distribution_weights)
    RS-->>CI: return BROWSERBASE_REGION

    CI->>CH: NEW: verifyChromiumLaunch()
    CH-->>CI: confirm CDP connection success

    Note over CI, REP: Runtime Execution (Evals/Tests)
    CI->>CORE: Run Eval Task / Test
    CORE->>CORE: NEW: setEnvTimeouts(LLM_MAX_MS, BB_CREATE_MS)
    
    CORE->>BB: createBrowserbaseSession(region)
    Note right of CORE: CHANGED: Wrapped in withTimeout
    BB-->>CORE: session_id, connect_url

    CORE->>LLM: createChatCompletion()
    Note right of CORE: CHANGED: Moved from examples to core API<br/>NEW: withLlmTimeout execution
    LLM-->>CORE: response data + usage

    Note over CORE, BB: Interaction Flow
    CORE->>BB: Locator.click() / Page.click()
    Note right of CORE: CHANGED: Parallelized input dispatch<br/>(mousePressed + mouseReleased)
    BB-->>CORE: interaction success

    Note over CI, REP: Post-Execution & Reporting
    CORE->>BB: NEW: endBrowserbaseSession()
    Note right of CORE: Best-effort Browser.close cleanup

    CI->>REP: NEW: publish-ctrf-report
    Note right of REP: CHANGED: JUnit to CTRF conversion
    CI->>REP: NEW: upload-v8-coverage
    REP-->>CI: Artifacts stored (Stability & Visibility)

    Note over CI, CORE: Server Replay Flow (Optional)
    CI->>CORE: GET /v1/sessions/:id/replay
    CORE-->>CI: CHANGED: Returns ReplayResult (updated schema with metadata)
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment on lines +63 to +65
const errMessage =
err instanceof Error ? (err.stack ?? err.message) : String(err);
return error(reply, errMessage, StatusCodes.INTERNAL_SERVER_ERROR);
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Exposing stack traces and internal error messages to clients is a security vulnerability. This can reveal internal file paths, library versions, and implementation details that help attackers. The error is already logged server-side via request.log.error(err), so debugging capability is preserved. Return a generic message to clients instead.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/server/src/lib/errorHandler.ts, line 63:

<comment>Exposing stack traces and internal error messages to clients is a security vulnerability. This can reveal internal file paths, library versions, and implementation details that help attackers. The error is already logged server-side via `request.log.error(err)`, so debugging capability is preserved. Return a generic message to clients instead.</comment>

<file context>
@@ -60,11 +60,9 @@ export function withErrorHandling<
-        "An unexpected error occurred",
-        StatusCodes.INTERNAL_SERVER_ERROR,
-      );
+      const errMessage =
+        err instanceof Error ? (err.stack ?? err.message) : String(err);
+      return error(reply, errMessage, StatusCodes.INTERNAL_SERVER_ERROR);
</file context>
Suggested change
const errMessage =
err instanceof Error ? (err.stack ?? err.message) : String(err);
return error(reply, errMessage, StatusCodes.INTERNAL_SERVER_ERROR);
return error(
reply,
"An unexpected error occurred",
StatusCodes.INTERNAL_SERVER_ERROR,
);
Fix with Cubic

};
return formattedMessage;
} else if (message.role === "user") {
const formattedMessage: ChatCompletionUserMessageParam = {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Message role is lost when content is a string. The original message.role is ignored and always set to "user", which would incorrectly convert system or assistant messages.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/external_clients/customOpenAI.ts, line 137:

<comment>Message role is lost when content is a string. The original `message.role` is ignored and always set to `"user"`, which would incorrectly convert system or assistant messages.</comment>

<file context>
@@ -0,0 +1,280 @@
+            };
+            return formattedMessage;
+          } else if (message.role === "user") {
+            const formattedMessage: ChatCompletionUserMessageParam = {
+              ...message,
+              role: "user",
</file context>
Fix with Cubic

function: {
name: tool.name,
description: tool.description,
inputSchema: tool.parameters,
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Incorrect property name inputSchema used instead of parameters for OpenAI tool function definition. The OpenAI API expects parameters, not inputSchema. This will cause tool calls to fail.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/external_clients/customOpenAI.ts, line 186:

<comment>Incorrect property name `inputSchema` used instead of `parameters` for OpenAI tool function definition. The OpenAI API expects `parameters`, not `inputSchema`. This will cause tool calls to fail.</comment>

<file context>
@@ -0,0 +1,280 @@
+        function: {
+          name: tool.name,
+          description: tool.description,
+          inputSchema: tool.parameters,
+        },
+        type: "function",
</file context>
Fix with Cubic

export const ReplayResultSchema = z
.object({
pages: z.array(ReplayPageSchema).optional(),
pages: z.array(ReplayPageSchema),
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Rule violated: Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test

Breaking changes to Replay API schemas (TokenUsageSchema, ReplayActionSchema, ReplayPageSchema, ReplayResultSchema) are not covered by integration tests. The packages/server/test/integration/v3/ directory has no replay.test.ts file. Per the rule, any breaking changes to shared request/response shapes in packages/core/**/types/public/api.ts must be covered by integration tests under packages/server/test.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/types/public/api.ts, line 828:

<comment>Breaking changes to Replay API schemas (`TokenUsageSchema`, `ReplayActionSchema`, `ReplayPageSchema`, `ReplayResultSchema`) are not covered by integration tests. The `packages/server/test/integration/v3/` directory has no `replay.test.ts` file. Per the rule, any breaking changes to shared request/response shapes in `packages/core/**/types/public/api.ts` must be covered by integration tests under `packages/server/test`.</comment>

<file context>
@@ -795,31 +795,38 @@ export const TokenUsageSchema = z
 export const ReplayResultSchema = z
   .object({
-    pages: z.array(ReplayPageSchema).optional(),
+    pages: z.array(ReplayPageSchema),
+    clientLanguage: z.string().optional(),
   })
</file context>
Fix with Cubic

"An unexpected error occurred",
StatusCodes.INTERNAL_SERVER_ERROR,
);
const errMessage =
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Rule violated: Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test

Breaking change to error response format lacks integration test coverage. The withErrorHandling wrapper now exposes error stack traces to clients instead of a generic message. This changes the API response contract for 500 errors. Per the rule, this server behavior change should have an integration test in packages/server/test that verifies the error response format.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/server/src/lib/errorHandler.ts, line 63:

<comment>Breaking change to error response format lacks integration test coverage. The `withErrorHandling` wrapper now exposes error stack traces to clients instead of a generic message. This changes the API response contract for 500 errors. Per the rule, this server behavior change should have an integration test in `packages/server/test` that verifies the error response format.</comment>

<file context>
@@ -60,11 +60,9 @@ export function withErrorHandling<
-        "An unexpected error occurred",
-        StatusCodes.INTERNAL_SERVER_ERROR,
-      );
+      const errMessage =
+        err instanceof Error ? (err.stack ?? err.message) : String(err);
+      return error(reply, errMessage, StatusCodes.INTERNAL_SERVER_ERROR);
</file context>
Fix with Cubic

},
});

if (options.image) {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Duplicate warning for image/vision support. The image variable was already destructured from options and checked at line 44-48. This second check is redundant.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/external_clients/customOpenAI.ts, line 71:

<comment>Duplicate warning for image/vision support. The `image` variable was already destructured from `options` and checked at line 44-48. This second check is redundant.</comment>

<file context>
@@ -0,0 +1,280 @@
+      },
+    });
+
+    if (options.image) {
+      console.warn(
+        "Image provided. Vision is not currently supported for openai",
</file context>
Fix with Cubic

exit 1
fi
echo "Selected Browserbase region: $chosen"
echo "region=$chosen" >> "$GITHUB_OUTPUT"

This comment was marked as resolved.

Comment on lines +34 to +36
- uses: ./.github/actions/setup-node-pnpm-turbo
with:
node-version: 22.x
cache: 'pnpm'
cache-dependency-path: '**/pnpm-lock.yaml'

- name: Install dependencies
env:
PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1"
run: pnpm install --frozen-lockfile
use-prebuilt-artifacts: "false"
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The new composite action no longer sets PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD, so pnpm install will download Playwright browsers during this workflow. That adds avoidable time/bandwidth to the release job; keep the env var on this step.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/stagehand-server-release.yml, line 34:

<comment>The new composite action no longer sets `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD`, so `pnpm install` will download Playwright browsers during this workflow. That adds avoidable time/bandwidth to the release job; keep the env var on this step.</comment>

<file context>
@@ -31,19 +31,9 @@ jobs:
-
-      - name: Setup Node.js
-        uses: actions/setup-node@v6
+      - uses: ./.github/actions/setup-node-pnpm-turbo
         with:
-          node-version: 22.x
</file context>
Suggested change
- uses: ./.github/actions/setup-node-pnpm-turbo
with:
node-version: 22.x
cache: 'pnpm'
cache-dependency-path: '**/pnpm-lock.yaml'
- name: Install dependencies
env:
PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1"
run: pnpm install --frozen-lockfile
use-prebuilt-artifacts: "false"
- uses: ./.github/actions/setup-node-pnpm-turbo
env:
PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1"
with:
use-prebuilt-artifacts: "false"
Fix with Cubic

@pirate pirate changed the title randomize region used for evals, split out pnpm and turbo cache, veri… Remove tsup, generate unified re-used ESM and CJS builds with sourcemaps and types, switch CI to new Evals CLI Jan 29, 2026
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 issues found across 67 files (changes from recent commits).

Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="packages/core/scripts/normalize-v8-coverage.ts">

<violation number="1" location="packages/core/scripts/normalize-v8-coverage.ts:267">
P2: SourceMapConsumer cleanup not in a `finally` block. If an error occurs during processing, the `consumer.destroy()` calls on lines 269-271 will be skipped, causing memory leaks. Wrap the processing logic in try/finally to ensure cleanup.</violation>

<violation number="2" location="packages/core/scripts/normalize-v8-coverage.ts:296">
P2: Unhandled promise rejection in main entry point. The `void main()` call discards the promise without handling potential errors from `normalizeV8Coverage`. Add a `.catch()` handler to log errors and exit with a non-zero code.</violation>
</file>

<file name="packages/core/package.json">

<violation number="1" location="packages/core/package.json:29">
P1: Scripts `e2e:local` and `e2e:bb` reference non-existent `test:e2e:local` and `test:e2e:bb` scripts in this package. These either need to be defined locally with the appropriate `STAGEHAND_BROWSER_TARGET` env var, or the alias scripts should set the env var directly.</violation>
</file>

<file name="packages/server/scripts/test-server.ts">

<violation number="1" location="packages/server/scripts/test-server.ts:89">
P2: The `?? process.env.STAGEHAND_API_URL` fallback is dead code since `baseUrl` is always defined (never nullish). If the intent is to always overwrite, remove the fallback. If the intent is to preserve an existing `STAGEHAND_API_URL`, this logic is incorrect.</violation>
</file>

<file name="packages/core/scripts/coverage.ts">

<violation number="1" location="packages/core/scripts/coverage.ts:93">
P2: Missing error check for spawn failure. If `spawnSync` fails (e.g., `pnpm` or `c8` not found), `result.error` will be set but ignored, and the script exits silently with code 1. Add error handling before processing stdout/stderr to provide diagnostic output.</violation>
</file>

<file name="packages/evals/scripts/test-evals.ts">

<violation number="1" location="packages/evals/scripts/test-evals.ts:27">
P2: Missing error handling around `JSON.parse`. If `eval-summary.json` exists but contains malformed JSON, the script will crash without generating the CTRF report, potentially breaking CI silently. Consider wrapping the JSON parsing in a try/catch and falling back to the missing report case on parse errors.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

};

if (import.meta.url === `file://${process.argv[1]}`) {
void main();
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Unhandled promise rejection in main entry point. The void main() call discards the promise without handling potential errors from normalizeV8Coverage. Add a .catch() handler to log errors and exit with a non-zero code.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/scripts/normalize-v8-coverage.ts, line 296:

<comment>Unhandled promise rejection in main entry point. The `void main()` call discards the promise without handling potential errors from `normalizeV8Coverage`. Add a `.catch()` handler to log errors and exit with a non-zero code.</comment>

<file context>
@@ -0,0 +1,297 @@
+};
+
+if (import.meta.url === `file://${process.argv[1]}`) {
+  void main();
+}
</file context>
Fix with Cubic

}
}

for (const ctx of sourceCache.values()) {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: SourceMapConsumer cleanup not in a finally block. If an error occurs during processing, the consumer.destroy() calls on lines 269-271 will be skipped, causing memory leaks. Wrap the processing logic in try/finally to ensure cleanup.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/scripts/normalize-v8-coverage.ts, line 267:

<comment>SourceMapConsumer cleanup not in a `finally` block. If an error occurs during processing, the `consumer.destroy()` calls on lines 269-271 will be skipped, causing memory leaks. Wrap the processing logic in try/finally to ensure cleanup.</comment>

<file context>
@@ -0,0 +1,297 @@
+    }
+  }
+
+  for (const ctx of sourceCache.values()) {
+    ctx?.consumer.destroy();
+  }
</file context>
Fix with Cubic

parsedBaseUrl.port || (parsedBaseUrl.protocol === "https:" ? "443" : "80");

process.env.PORT = port;
process.env.STAGEHAND_API_URL = baseUrl ?? process.env.STAGEHAND_API_URL;
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The ?? process.env.STAGEHAND_API_URL fallback is dead code since baseUrl is always defined (never nullish). If the intent is to always overwrite, remove the fallback. If the intent is to preserve an existing STAGEHAND_API_URL, this logic is incorrect.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/server/scripts/test-server.ts, line 89:

<comment>The `?? process.env.STAGEHAND_API_URL` fallback is dead code since `baseUrl` is always defined (never nullish). If the intent is to always overwrite, remove the fallback. If the intent is to preserve an existing `STAGEHAND_API_URL`, this logic is incorrect.</comment>

<file context>
@@ -0,0 +1,326 @@
+  parsedBaseUrl.port || (parsedBaseUrl.protocol === "https:" ? "443" : "80");
+
+process.env.PORT = port;
+process.env.STAGEHAND_API_URL = baseUrl ?? process.env.STAGEHAND_API_URL;
+process.env.BB_ENV = process.env.BB_ENV ?? "local";
+
</file context>
Fix with Cubic

},
);

if (result.stdout) {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Missing error check for spawn failure. If spawnSync fails (e.g., pnpm or c8 not found), result.error will be set but ignored, and the script exits silently with code 1. Add error handling before processing stdout/stderr to provide diagnostic output.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/scripts/coverage.ts, line 93:

<comment>Missing error check for spawn failure. If `spawnSync` fails (e.g., `pnpm` or `c8` not found), `result.error` will be set but ignored, and the script exits silently with code 1. Add error handling before processing stdout/stderr to provide diagnostic output.</comment>

<file context>
@@ -0,0 +1,100 @@
+  },
+);
+
+if (result.stdout) {
+  process.stdout.write(result.stdout);
+  fs.writeFileSync(path.join(outDir, "coverage-summary.txt"), result.stdout);
</file context>
Fix with Cubic

) => {
const timestamp = new Date().toISOString();
if (fs.existsSync(summaryPath)) {
const summary = JSON.parse(fs.readFileSync(summaryPath, "utf8")) as {
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Jan 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Missing error handling around JSON.parse. If eval-summary.json exists but contains malformed JSON, the script will crash without generating the CTRF report, potentially breaking CI silently. Consider wrapping the JSON parsing in a try/catch and falling back to the missing report case on parse errors.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/evals/scripts/test-evals.ts, line 27:

<comment>Missing error handling around `JSON.parse`. If `eval-summary.json` exists but contains malformed JSON, the script will crash without generating the CTRF report, potentially breaking CI silently. Consider wrapping the JSON parsing in a try/catch and falling back to the missing report case on parse errors.</comment>

<file context>
@@ -0,0 +1,179 @@
+) => {
+  const timestamp = new Date().toISOString();
+  if (fs.existsSync(summaryPath)) {
+    const summary = JSON.parse(fs.readFileSync(summaryPath, "utf8")) as {
+      passed?: Array<{ eval: string; model: string; categories?: string[] }>;
+      failed?: Array<{ eval: string; model: string; categories?: string[] }>;
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant