Remove tsup, generate unified re-used ESM and CJS builds with sourcemaps and types, switch CI to new Evals CLI#1632
Conversation
…fy chrome launches before running tests
|
|
Too many files changed for review. ( |
✱ Stainless preview buildsThis PR will update the Edit this comment to update it. It will appear in the SDK's changelogs. ✅ stagehand-typescript studio · code · diff
✅ stagehand-kotlin studio · code · diff
✅ stagehand-ruby studio · code · diff
✅ stagehand-php studio · code · diff
✅ stagehand-csharp studio · code · diff
✅ stagehand-python studio · code · diff
✅ stagehand-openapi studio · code · diff
✅ stagehand-go studio · code · diff
✅ stagehand-java studio · conflict
This comment is auto-generated by GitHub Actions and is automatically kept up to date as you push. |
There was a problem hiding this comment.
16 issues found across 101 files
Confidence score: 2/5
- High-risk security issue:
packages/server/src/lib/errorHandler.tsexposes stack traces/internal details to clients, which can leak sensitive implementation info. - Concrete behavior bugs in
packages/core/lib/v3/external_clients/customOpenAI.ts(wronginputSchemakey and lost message roles) likely break tool calls and mis-handle system/assistant messages. - Multiple breaking API changes lack required integration tests (e.g.,
packages/core/lib/v3/types/public/api.ts,packages/server/src/lib/errorHandler.ts,packages/server/src/routes/v1/sessions/_id/replay.ts), increasing regression risk. - Pay close attention to
packages/server/src/lib/errorHandler.ts,packages/core/lib/v3/external_clients/customOpenAI.ts,packages/core/lib/v3/types/public/api.ts,packages/server/src/routes/v1/sessions/_id/replay.ts- security exposure, API correctness, and untested breaking changes.
Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="packages/evals/scoring.ts">
<violation number="1" location="packages/evals/scoring.ts:12">
P2: Logging Error outputs now collapses to "{}" because JSON.stringify(Error) returns an empty object, so error details are lost compared to the prior String(Error) behavior. Consider detecting Error and logging its message/stack instead.</violation>
</file>
<file name="packages/server/scripts/build-sea.ts">
<violation number="1" location="packages/server/scripts/build-sea.ts:39">
P2: File descriptor leak: the `WriteStream` is not closed in error paths (redirects, non-200 status, network errors). This can exhaust file descriptors if the script encounters multiple failures or redirects.</violation>
</file>
<file name=".github/workflows/stagehand-server-sea-build.yml">
<violation number="1" location=".github/workflows/stagehand-server-sea-build.yml:50">
P2: The new composite setup action installs dependencies without `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD`, so Playwright browser binaries will download during CI (the previous workflow explicitly skipped this). If the SEA build doesn’t need browsers, this adds large unnecessary downloads and can slow or fail builds. Consider setting the env on the action step to preserve the previous behavior.</violation>
</file>
<file name="packages/server/src/lib/errorHandler.ts">
<violation number="1" location="packages/server/src/lib/errorHandler.ts:63">
P1: Exposing stack traces and internal error messages to clients is a security vulnerability. This can reveal internal file paths, library versions, and implementation details that help attackers. The error is already logged server-side via `request.log.error(err)`, so debugging capability is preserved. Return a generic message to clients instead.</violation>
<violation number="2" location="packages/server/src/lib/errorHandler.ts:63">
P1: Rule violated: **Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test**
Breaking change to error response format lacks integration test coverage. The `withErrorHandling` wrapper now exposes error stack traces to clients instead of a generic message. This changes the API response contract for 500 errors. Per the rule, this server behavior change should have an integration test in `packages/server/test` that verifies the error response format.</violation>
</file>
<file name="packages/core/vitest.config.ts">
<violation number="1" location="packages/core/vitest.config.ts:24">
P2: Bug: `String.replace()` replaces the first occurrence, not the file extension. If the extension string (e.g., `.ts`) appears in a directory name, the wrong part of the path will be replaced. Use `candidate.slice(0, -ext.length) + ".js"` to ensure only the actual extension at the end is replaced.</violation>
</file>
<file name=".github/actions/verify-chromium-launch/action.yml">
<violation number="1" location=".github/actions/verify-chromium-launch/action.yml:100">
P2: Duplicate Chrome flag `--disable-component-update` - this flag already appears earlier in the args array (line 69). Remove the duplicate to improve maintainability.</violation>
</file>
<file name="packages/core/lib/v3/external_clients/customOpenAI.ts">
<violation number="1" location="packages/core/lib/v3/external_clients/customOpenAI.ts:71">
P2: Duplicate warning for image/vision support. The `image` variable was already destructured from `options` and checked at line 44-48. This second check is redundant.</violation>
<violation number="2" location="packages/core/lib/v3/external_clients/customOpenAI.ts:137">
P1: Message role is lost when content is a string. The original `message.role` is ignored and always set to `"user"`, which would incorrectly convert system or assistant messages.</violation>
<violation number="3" location="packages/core/lib/v3/external_clients/customOpenAI.ts:186">
P1: Incorrect property name `inputSchema` used instead of `parameters` for OpenAI tool function definition. The OpenAI API expects `parameters`, not `inputSchema`. This will cause tool calls to fail.</violation>
</file>
<file name=".github/actions/select-browserbase-region/action.yml">
<violation number="1" location=".github/actions/select-browserbase-region/action.yml:60">
P2: Validate/sanitize region values before writing to $GITHUB_OUTPUT/$GITHUB_ENV to prevent newline injection. At minimum, reject regions containing newline or characters outside an allowlist (e.g., /^[A-Za-z0-9-]+$/) before setting outputs/env.</violation>
</file>
<file name=".github/actions/publish-ctrf-report/action.yml">
<violation number="1" location=".github/actions/publish-ctrf-report/action.yml:12">
P3: `github-token` is required but never used, forcing callers to provide a meaningless secret. Remove the input or make it optional and wire it into a step if needed.</violation>
<violation number="2" location=".github/actions/publish-ctrf-report/action.yml:71">
P2: The CTRF report check uses `-f` on a quoted glob, so it never detects default `./ctrf/*.json` matches and skips uploads. Use a glob-aware check (e.g., `compgen -G`) to detect any matching files.</violation>
</file>
<file name=".github/workflows/stagehand-server-release.yml">
<violation number="1" location=".github/workflows/stagehand-server-release.yml:34">
P2: The new composite action no longer sets `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD`, so `pnpm install` will download Playwright browsers during this workflow. That adds avoidable time/bandwidth to the release job; keep the env var on this step.</violation>
</file>
<file name="packages/core/lib/v3/types/public/api.ts">
<violation number="1" location="packages/core/lib/v3/types/public/api.ts:828">
P1: Rule violated: **Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test**
Breaking changes to Replay API schemas (`TokenUsageSchema`, `ReplayActionSchema`, `ReplayPageSchema`, `ReplayResultSchema`) are not covered by integration tests. The `packages/server/test/integration/v3/` directory has no `replay.test.ts` file. Per the rule, any breaking changes to shared request/response shapes in `packages/core/**/types/public/api.ts` must be covered by integration tests under `packages/server/test`.</violation>
</file>
<file name="packages/server/src/routes/v1/sessions/_id/replay.ts">
<violation number="1" location="packages/server/src/routes/v1/sessions/_id/replay.ts:22">
P1: Rule violated: **Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test**
The replay endpoint now returns a 200 success response (empty replay) instead of the previous 501 Not Implemented, which is a breaking change to the Stagehand REST API. The rule "Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test" is violated because there is no integration test for /sessions/:id/replay in packages/server/test.</violation>
</file>
Architecture diagram
sequenceDiagram
participant CI as GitHub Actions
participant RS as Region Selector
participant CORE as Stagehand Core (SDK)
participant BB as Browserbase API
participant CH as Chromium (Local)
participant LLM as LLM Client (OpenAI/Anthropic)
participant REP as CTRF & V8 Reporter
Note over CI, REP: CI Initialization & Environment Setup
CI->>RS: NEW: selectRegion(distribution_weights)
RS-->>CI: return BROWSERBASE_REGION
CI->>CH: NEW: verifyChromiumLaunch()
CH-->>CI: confirm CDP connection success
Note over CI, REP: Runtime Execution (Evals/Tests)
CI->>CORE: Run Eval Task / Test
CORE->>CORE: NEW: setEnvTimeouts(LLM_MAX_MS, BB_CREATE_MS)
CORE->>BB: createBrowserbaseSession(region)
Note right of CORE: CHANGED: Wrapped in withTimeout
BB-->>CORE: session_id, connect_url
CORE->>LLM: createChatCompletion()
Note right of CORE: CHANGED: Moved from examples to core API<br/>NEW: withLlmTimeout execution
LLM-->>CORE: response data + usage
Note over CORE, BB: Interaction Flow
CORE->>BB: Locator.click() / Page.click()
Note right of CORE: CHANGED: Parallelized input dispatch<br/>(mousePressed + mouseReleased)
BB-->>CORE: interaction success
Note over CI, REP: Post-Execution & Reporting
CORE->>BB: NEW: endBrowserbaseSession()
Note right of CORE: Best-effort Browser.close cleanup
CI->>REP: NEW: publish-ctrf-report
Note right of REP: CHANGED: JUnit to CTRF conversion
CI->>REP: NEW: upload-v8-coverage
REP-->>CI: Artifacts stored (Stability & Visibility)
Note over CI, CORE: Server Replay Flow (Optional)
CI->>CORE: GET /v1/sessions/:id/replay
CORE-->>CI: CHANGED: Returns ReplayResult (updated schema with metadata)
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| const errMessage = | ||
| err instanceof Error ? (err.stack ?? err.message) : String(err); | ||
| return error(reply, errMessage, StatusCodes.INTERNAL_SERVER_ERROR); |
There was a problem hiding this comment.
P1: Exposing stack traces and internal error messages to clients is a security vulnerability. This can reveal internal file paths, library versions, and implementation details that help attackers. The error is already logged server-side via request.log.error(err), so debugging capability is preserved. Return a generic message to clients instead.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/server/src/lib/errorHandler.ts, line 63:
<comment>Exposing stack traces and internal error messages to clients is a security vulnerability. This can reveal internal file paths, library versions, and implementation details that help attackers. The error is already logged server-side via `request.log.error(err)`, so debugging capability is preserved. Return a generic message to clients instead.</comment>
<file context>
@@ -60,11 +60,9 @@ export function withErrorHandling<
- "An unexpected error occurred",
- StatusCodes.INTERNAL_SERVER_ERROR,
- );
+ const errMessage =
+ err instanceof Error ? (err.stack ?? err.message) : String(err);
+ return error(reply, errMessage, StatusCodes.INTERNAL_SERVER_ERROR);
</file context>
| const errMessage = | |
| err instanceof Error ? (err.stack ?? err.message) : String(err); | |
| return error(reply, errMessage, StatusCodes.INTERNAL_SERVER_ERROR); | |
| return error( | |
| reply, | |
| "An unexpected error occurred", | |
| StatusCodes.INTERNAL_SERVER_ERROR, | |
| ); |
| }; | ||
| return formattedMessage; | ||
| } else if (message.role === "user") { | ||
| const formattedMessage: ChatCompletionUserMessageParam = { |
There was a problem hiding this comment.
P1: Message role is lost when content is a string. The original message.role is ignored and always set to "user", which would incorrectly convert system or assistant messages.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/external_clients/customOpenAI.ts, line 137:
<comment>Message role is lost when content is a string. The original `message.role` is ignored and always set to `"user"`, which would incorrectly convert system or assistant messages.</comment>
<file context>
@@ -0,0 +1,280 @@
+ };
+ return formattedMessage;
+ } else if (message.role === "user") {
+ const formattedMessage: ChatCompletionUserMessageParam = {
+ ...message,
+ role: "user",
</file context>
| function: { | ||
| name: tool.name, | ||
| description: tool.description, | ||
| inputSchema: tool.parameters, |
There was a problem hiding this comment.
P1: Incorrect property name inputSchema used instead of parameters for OpenAI tool function definition. The OpenAI API expects parameters, not inputSchema. This will cause tool calls to fail.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/external_clients/customOpenAI.ts, line 186:
<comment>Incorrect property name `inputSchema` used instead of `parameters` for OpenAI tool function definition. The OpenAI API expects `parameters`, not `inputSchema`. This will cause tool calls to fail.</comment>
<file context>
@@ -0,0 +1,280 @@
+ function: {
+ name: tool.name,
+ description: tool.description,
+ inputSchema: tool.parameters,
+ },
+ type: "function",
</file context>
| export const ReplayResultSchema = z | ||
| .object({ | ||
| pages: z.array(ReplayPageSchema).optional(), | ||
| pages: z.array(ReplayPageSchema), |
There was a problem hiding this comment.
P1: Rule violated: Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test
Breaking changes to Replay API schemas (TokenUsageSchema, ReplayActionSchema, ReplayPageSchema, ReplayResultSchema) are not covered by integration tests. The packages/server/test/integration/v3/ directory has no replay.test.ts file. Per the rule, any breaking changes to shared request/response shapes in packages/core/**/types/public/api.ts must be covered by integration tests under packages/server/test.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/types/public/api.ts, line 828:
<comment>Breaking changes to Replay API schemas (`TokenUsageSchema`, `ReplayActionSchema`, `ReplayPageSchema`, `ReplayResultSchema`) are not covered by integration tests. The `packages/server/test/integration/v3/` directory has no `replay.test.ts` file. Per the rule, any breaking changes to shared request/response shapes in `packages/core/**/types/public/api.ts` must be covered by integration tests under `packages/server/test`.</comment>
<file context>
@@ -795,31 +795,38 @@ export const TokenUsageSchema = z
export const ReplayResultSchema = z
.object({
- pages: z.array(ReplayPageSchema).optional(),
+ pages: z.array(ReplayPageSchema),
+ clientLanguage: z.string().optional(),
})
</file context>
| "An unexpected error occurred", | ||
| StatusCodes.INTERNAL_SERVER_ERROR, | ||
| ); | ||
| const errMessage = |
There was a problem hiding this comment.
P1: Rule violated: Any breaking changes to Stagehand REST API client / server implementation must be covered by an integration test under packages/server/test
Breaking change to error response format lacks integration test coverage. The withErrorHandling wrapper now exposes error stack traces to clients instead of a generic message. This changes the API response contract for 500 errors. Per the rule, this server behavior change should have an integration test in packages/server/test that verifies the error response format.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/server/src/lib/errorHandler.ts, line 63:
<comment>Breaking change to error response format lacks integration test coverage. The `withErrorHandling` wrapper now exposes error stack traces to clients instead of a generic message. This changes the API response contract for 500 errors. Per the rule, this server behavior change should have an integration test in `packages/server/test` that verifies the error response format.</comment>
<file context>
@@ -60,11 +60,9 @@ export function withErrorHandling<
- "An unexpected error occurred",
- StatusCodes.INTERNAL_SERVER_ERROR,
- );
+ const errMessage =
+ err instanceof Error ? (err.stack ?? err.message) : String(err);
+ return error(reply, errMessage, StatusCodes.INTERNAL_SERVER_ERROR);
</file context>
| }, | ||
| }); | ||
|
|
||
| if (options.image) { |
There was a problem hiding this comment.
P2: Duplicate warning for image/vision support. The image variable was already destructured from options and checked at line 44-48. This second check is redundant.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/external_clients/customOpenAI.ts, line 71:
<comment>Duplicate warning for image/vision support. The `image` variable was already destructured from `options` and checked at line 44-48. This second check is redundant.</comment>
<file context>
@@ -0,0 +1,280 @@
+ },
+ });
+
+ if (options.image) {
+ console.warn(
+ "Image provided. Vision is not currently supported for openai",
</file context>
| exit 1 | ||
| fi | ||
| echo "Selected Browserbase region: $chosen" | ||
| echo "region=$chosen" >> "$GITHUB_OUTPUT" |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
| - uses: ./.github/actions/setup-node-pnpm-turbo | ||
| with: | ||
| node-version: 22.x | ||
| cache: 'pnpm' | ||
| cache-dependency-path: '**/pnpm-lock.yaml' | ||
|
|
||
| - name: Install dependencies | ||
| env: | ||
| PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1" | ||
| run: pnpm install --frozen-lockfile | ||
| use-prebuilt-artifacts: "false" |
There was a problem hiding this comment.
P2: The new composite action no longer sets PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD, so pnpm install will download Playwright browsers during this workflow. That adds avoidable time/bandwidth to the release job; keep the env var on this step.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/stagehand-server-release.yml, line 34:
<comment>The new composite action no longer sets `PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD`, so `pnpm install` will download Playwright browsers during this workflow. That adds avoidable time/bandwidth to the release job; keep the env var on this step.</comment>
<file context>
@@ -31,19 +31,9 @@ jobs:
-
- - name: Setup Node.js
- uses: actions/setup-node@v6
+ - uses: ./.github/actions/setup-node-pnpm-turbo
with:
- node-version: 22.x
</file context>
| - uses: ./.github/actions/setup-node-pnpm-turbo | |
| with: | |
| node-version: 22.x | |
| cache: 'pnpm' | |
| cache-dependency-path: '**/pnpm-lock.yaml' | |
| - name: Install dependencies | |
| env: | |
| PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1" | |
| run: pnpm install --frozen-lockfile | |
| use-prebuilt-artifacts: "false" | |
| - uses: ./.github/actions/setup-node-pnpm-turbo | |
| env: | |
| PLAYWRIGHT_SKIP_BROWSER_DOWNLOAD: "1" | |
| with: | |
| use-prebuilt-artifacts: "false" |
There was a problem hiding this comment.
6 issues found across 67 files (changes from recent commits).
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="packages/core/scripts/normalize-v8-coverage.ts">
<violation number="1" location="packages/core/scripts/normalize-v8-coverage.ts:267">
P2: SourceMapConsumer cleanup not in a `finally` block. If an error occurs during processing, the `consumer.destroy()` calls on lines 269-271 will be skipped, causing memory leaks. Wrap the processing logic in try/finally to ensure cleanup.</violation>
<violation number="2" location="packages/core/scripts/normalize-v8-coverage.ts:296">
P2: Unhandled promise rejection in main entry point. The `void main()` call discards the promise without handling potential errors from `normalizeV8Coverage`. Add a `.catch()` handler to log errors and exit with a non-zero code.</violation>
</file>
<file name="packages/core/package.json">
<violation number="1" location="packages/core/package.json:29">
P1: Scripts `e2e:local` and `e2e:bb` reference non-existent `test:e2e:local` and `test:e2e:bb` scripts in this package. These either need to be defined locally with the appropriate `STAGEHAND_BROWSER_TARGET` env var, or the alias scripts should set the env var directly.</violation>
</file>
<file name="packages/server/scripts/test-server.ts">
<violation number="1" location="packages/server/scripts/test-server.ts:89">
P2: The `?? process.env.STAGEHAND_API_URL` fallback is dead code since `baseUrl` is always defined (never nullish). If the intent is to always overwrite, remove the fallback. If the intent is to preserve an existing `STAGEHAND_API_URL`, this logic is incorrect.</violation>
</file>
<file name="packages/core/scripts/coverage.ts">
<violation number="1" location="packages/core/scripts/coverage.ts:93">
P2: Missing error check for spawn failure. If `spawnSync` fails (e.g., `pnpm` or `c8` not found), `result.error` will be set but ignored, and the script exits silently with code 1. Add error handling before processing stdout/stderr to provide diagnostic output.</violation>
</file>
<file name="packages/evals/scripts/test-evals.ts">
<violation number="1" location="packages/evals/scripts/test-evals.ts:27">
P2: Missing error handling around `JSON.parse`. If `eval-summary.json` exists but contains malformed JSON, the script will crash without generating the CTRF report, potentially breaking CI silently. Consider wrapping the JSON parsing in a try/catch and falling back to the missing report case on parse errors.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| }; | ||
|
|
||
| if (import.meta.url === `file://${process.argv[1]}`) { | ||
| void main(); |
There was a problem hiding this comment.
P2: Unhandled promise rejection in main entry point. The void main() call discards the promise without handling potential errors from normalizeV8Coverage. Add a .catch() handler to log errors and exit with a non-zero code.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/scripts/normalize-v8-coverage.ts, line 296:
<comment>Unhandled promise rejection in main entry point. The `void main()` call discards the promise without handling potential errors from `normalizeV8Coverage`. Add a `.catch()` handler to log errors and exit with a non-zero code.</comment>
<file context>
@@ -0,0 +1,297 @@
+};
+
+if (import.meta.url === `file://${process.argv[1]}`) {
+ void main();
+}
</file context>
| } | ||
| } | ||
|
|
||
| for (const ctx of sourceCache.values()) { |
There was a problem hiding this comment.
P2: SourceMapConsumer cleanup not in a finally block. If an error occurs during processing, the consumer.destroy() calls on lines 269-271 will be skipped, causing memory leaks. Wrap the processing logic in try/finally to ensure cleanup.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/scripts/normalize-v8-coverage.ts, line 267:
<comment>SourceMapConsumer cleanup not in a `finally` block. If an error occurs during processing, the `consumer.destroy()` calls on lines 269-271 will be skipped, causing memory leaks. Wrap the processing logic in try/finally to ensure cleanup.</comment>
<file context>
@@ -0,0 +1,297 @@
+ }
+ }
+
+ for (const ctx of sourceCache.values()) {
+ ctx?.consumer.destroy();
+ }
</file context>
| parsedBaseUrl.port || (parsedBaseUrl.protocol === "https:" ? "443" : "80"); | ||
|
|
||
| process.env.PORT = port; | ||
| process.env.STAGEHAND_API_URL = baseUrl ?? process.env.STAGEHAND_API_URL; |
There was a problem hiding this comment.
P2: The ?? process.env.STAGEHAND_API_URL fallback is dead code since baseUrl is always defined (never nullish). If the intent is to always overwrite, remove the fallback. If the intent is to preserve an existing STAGEHAND_API_URL, this logic is incorrect.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/server/scripts/test-server.ts, line 89:
<comment>The `?? process.env.STAGEHAND_API_URL` fallback is dead code since `baseUrl` is always defined (never nullish). If the intent is to always overwrite, remove the fallback. If the intent is to preserve an existing `STAGEHAND_API_URL`, this logic is incorrect.</comment>
<file context>
@@ -0,0 +1,326 @@
+ parsedBaseUrl.port || (parsedBaseUrl.protocol === "https:" ? "443" : "80");
+
+process.env.PORT = port;
+process.env.STAGEHAND_API_URL = baseUrl ?? process.env.STAGEHAND_API_URL;
+process.env.BB_ENV = process.env.BB_ENV ?? "local";
+
</file context>
| }, | ||
| ); | ||
|
|
||
| if (result.stdout) { |
There was a problem hiding this comment.
P2: Missing error check for spawn failure. If spawnSync fails (e.g., pnpm or c8 not found), result.error will be set but ignored, and the script exits silently with code 1. Add error handling before processing stdout/stderr to provide diagnostic output.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/scripts/coverage.ts, line 93:
<comment>Missing error check for spawn failure. If `spawnSync` fails (e.g., `pnpm` or `c8` not found), `result.error` will be set but ignored, and the script exits silently with code 1. Add error handling before processing stdout/stderr to provide diagnostic output.</comment>
<file context>
@@ -0,0 +1,100 @@
+ },
+);
+
+if (result.stdout) {
+ process.stdout.write(result.stdout);
+ fs.writeFileSync(path.join(outDir, "coverage-summary.txt"), result.stdout);
</file context>
| ) => { | ||
| const timestamp = new Date().toISOString(); | ||
| if (fs.existsSync(summaryPath)) { | ||
| const summary = JSON.parse(fs.readFileSync(summaryPath, "utf8")) as { |
There was a problem hiding this comment.
P2: Missing error handling around JSON.parse. If eval-summary.json exists but contains malformed JSON, the script will crash without generating the CTRF report, potentially breaking CI silently. Consider wrapping the JSON parsing in a try/catch and falling back to the missing report case on parse errors.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/evals/scripts/test-evals.ts, line 27:
<comment>Missing error handling around `JSON.parse`. If `eval-summary.json` exists but contains malformed JSON, the script will crash without generating the CTRF report, potentially breaking CI silently. Consider wrapping the JSON parsing in a try/catch and falling back to the missing report case on parse errors.</comment>
<file context>
@@ -0,0 +1,179 @@
+) => {
+ const timestamp = new Date().toISOString();
+ if (fs.existsSync(summaryPath)) {
+ const summary = JSON.parse(fs.readFileSync(summaryPath, "utf8")) as {
+ passed?: Array<{ eval: string; model: string; categories?: string[] }>;
+ failed?: Array<{ eval: string; model: string; categories?: string[] }>;
</file context>
why
what changed
test plan
Summary by cubic
Rebuilt the core into unified ESM and CJS builds with sourcemaps and types, and switched CI to the new Evals CLI running against the ESM dist. This simplifies consumption via export maps, improves test stability, and enhances coverage/reporting.
New Features
Migration
Written for commit 6b9bb21. Summary will update on new commits. Review in cubic