[STG-1232] Speed up PR Github Actions checks, add code coverage, fix e2e bb tests running not running in bb mode by pirate · Pull Request #1600 · browserbase/stagehand

pirate · 2026-01-22T23:18:20Z

why

unit tests, e2e integration tests, and evals were taking 10~15 minutes per pr
finding the specific failing eval or test was tedious and required reading tons of log output
we weren't leveraging pnpm or turbo caches or github action's native chromium install
tests were flaky because CDP would randomly fail to connect when github actions runner was overloaded

what changed

parallelized all CI tests and broke out evals and tests into individual matrix jobs
created unified node+pnpm+turbo setup action that all other actions call
verified vanilla chromium is able to launch inside all the e2e local jobs before running any of our own code to catch resource contention flakyness

Summary by cubic

Run Vitest and Playwright E2E tests for packages/core in parallel by discovering each file and running them as matrix jobs. Evals reuse the shared build and run after Browserbase E2E to avoid contention, with CI-enforced timeouts for LLM and Browserbase to keep runs fast, and publish CTRF reports and V8 coverage to Insights.

Refactors
- Added discover-core-tests and core-unit-tests matrix to run packages/core/tests/*.test.ts with Vitest; fail-fast on.
- Added discover-e2e-tests and per-spec matrix for v3 Playwright (local and Browserbase); gated on core changes; CI selects a Browserbase region from an env distribution and passes it to tests and evals; local runs verify Chromium/CDP before tests and use the runner’s native Chromium via CHROME_PATH; Playwright reporters emit JUnit when CTRF_JUNIT_PATH is set.
- Introduced a single Lint & Build job that uploads shared artifacts with Turbo caching; unit/e2e/evals download and reuse them; prepare script skips build in CI; regression evals fail CI if score < 90%.
- Made timeouts and parallelism configurable via env (LLM_MAX_MS, BROWSERBASE_CDP_CONNECT_MAX_MS, BROWSERBASE_SESSION_CREATE_MAX_MS, LOCAL_SESSION_LIMIT_PER_E2E_TEST, BROWSERBASE_SESSION_LIMIT_PER_E2E_TEST, EVAL_MAX_CONCURRENCY, EVAL_TRIAL_COUNT, EVAL_AGENT_MAX_CONCURRENCY, EVAL_AGENT_TRIAL_COUNT, CHROME_PATH); enforced LLM call and Browserbase connect/session timeouts; reduced a flaky test’s iterations for stability.
- Explicitly release Browserbase sessions at the end of tests and evals to free capacity.
- Added composite actions for Node+pnpm+Turbo caching, Chromium launch verification, weighted Browserbase region selection, CTRF publishing (JUnit→CTRF and eval CTRF), and V8 coverage upload.
- Renamed TEST_ENV to STAGEHAND_ENV across E2E configs to ensure Browserbase specs run in BB mode; added an env reporter to surface test settings in CI logs.

^{Written for commit 80fad7e. Summary will update on new commits. Review in cubic}

changeset-bot · 2026-01-22T23:18:24Z

⚠️ No Changeset found

Latest commit: 3819ec6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

greptile-apps · 2026-01-22T23:20:23Z

Greptile Summary

Replaced monolithic vitest execution with parallel test runs by introducing a matrix strategy that discovers test files dynamically and runs each *.test.ts file in a separate CI job. This change splits the previous single Run Vitest step into two new jobs: discover-core-tests (discovers all test files and creates a matrix) and core-unit-tests (runs each test file in parallel using the matrix). The implementation uses shell scripting to build a JSON array of test files and leverages GitHub Actions' fromJson() to create parallel jobs with fail-fast: false.

Parallelizes 24 unit test files across separate GitHub Actions runners
Maintains proper dependency chain by requiring run-build to complete before tests
Downloads build artifacts to each test runner to ensure tests use the latest build
Preserves test output visibility by naming each job core/${{ matrix.test.name }}

Confidence Score: 5/5

Safe to merge - straightforward CI parallelization with proper job dependencies
The change is well-structured and follows GitHub Actions best practices. The matrix strategy correctly handles test file discovery, maintains proper job dependencies through needs:, and preserves the build artifact flow. The fail-fast: false setting ensures all tests run even if some fail, providing complete test coverage feedback. No logic changes to test code itself.
No files require special attention

Important Files Changed

Filename	Overview
.github/workflows/ci.yml	Replaced single vitest job with parallel test execution using GitHub Actions matrix strategy to run 24 test files concurrently

Sequence Diagram

sequenceDiagram
    participant DC as determine-changes
    participant DB as run-build
    participant DT as discover-core-tests
    participant CU as core-unit-tests
    participant E2E as run-e2e-*-tests
    
    DC->>DB: core == 'true'
    DC->>DT: core == 'true'
    
    Note over DT: Find all *.test.ts files<br/>in packages/core/tests
    DT->>DT: Build JSON matrix array<br/>of test files
    DT->>DT: Set outputs:<br/>core-tests, has-core-tests
    
    DB->>DB: Build packages
    DB->>DB: Upload build artifacts
    
    DT-->>CU: has-core-tests == 'true'
    DB-->>CU: Provides build artifacts
    
    Note over CU: Matrix strategy creates<br/>parallel jobs for each test
    
    par Test File 1
        CU->>CU: Setup environment
        CU->>CU: Download build artifacts
        CU->>CU: vitest run test1.test.ts
    and Test File 2
        CU->>CU: Setup environment
        CU->>CU: Download build artifacts
        CU->>CU: vitest run test2.test.ts
    and Test File N
        CU->>CU: Setup environment
        CU->>CU: Download build artifacts
        CU->>CU: vitest run testN.test.ts
    end
    
    CU-->>E2E: All tests complete
    
    Note over E2E: E2E tests run after<br/>build completes

cubic-dev-ai

No issues found across 1 file

Confidence score: 5/5

Automated review surfaced no issues in the provided summaries.
No files require special attention.

Architecture diagram

sequenceDiagram
    participant GH as GitHub Actions
    participant Disc as Job: discover-core-tests
    participant Worker as Job: core-unit-tests (Matrix)
    participant Store as Artifacts & Cache

    Note over GH,Store: CHANGED: Parallelization Strategy

    GH->>Disc: Trigger if core changed
    
    Note right of Disc: Shell Script Execution
    Disc->>Disc: NEW: find ./tests -name "*.test.ts"
    Disc->>Disc: NEW: Construct JSON array [{path, name}...]
    Disc-->>GH: Output: core-tests JSON

    GH->>GH: Parse JSON & Expand Matrix
    
    par NEW: Parallel Execution
        GH->>Worker: Spawn Worker 1 (Test A)
        GH->>Worker: Spawn Worker N (Test B)
    end

    Note right of Worker: Runs concurrently per file
    
    loop Setup & Run
        Worker->>Store: Restore pnpm store (Cache)
        Worker->>Store: Download 'build-artifacts'
        Store-->>Worker: lib/ files
        
        Worker->>Worker: NEW: vitest run [matrix.test.path]
        
        alt Test Fails
            Worker-->>GH: Mark job failed
            Note over Worker,GH: fail-fast: false (Other workers continue)
        else Test Passes
            Worker-->>GH: Mark job success
        end
    end

…fy chrome launches before running tests

packages/core/lib/v3/tests/context-addInitScript.spec.ts

packages/core/lib/v3/tests/page-addInitScript.spec.ts

packages/core/lib/v3/tests/shadow-iframe-spif.spec.ts

packages/core/lib/v3/tests/v3.bb.config.ts

cubic-dev-ai bot reviewed Jan 22, 2026

View reviewed changes

pirate changed the title ~~run vitests in parallel in ci~~ Massively speed up PR Github Actions checks by running vitests and e2e tests in parallel instead of in one long-running job Jan 22, 2026

This comment was marked as resolved.

Sign in to view

randomize region used for evals, split out pnpm and turbo cache, veri…

6c906c8

…fy chrome launches before running tests

pirate force-pushed the parallel-ci-tests branch from 388a1e6 to 6c906c8 Compare January 26, 2026 21:10

pirate added 13 commits January 26, 2026 13:14

use github actions native chromium for integration tests

4ccf598

restore playwright browser usage for integration tests

20b6745

add code coverage and flaky test reporting

e07bd93

fix lint

e4e2d63

make sure ctrf artifacts are unique

ecc132d

add missing packages and fix ctrf summaries

2992f77

fix sanitization of vitest artifacts

df3708c

remove custom action for vitest sanitization

68d79c2

only merge coverage in final step

5001deb

check ratelimits

543bf42

log if ratelimits

1faba4f

limit github api calls

e632cb6

fix e2e bb tests running on local and coverage reporting

982699f

This comment was marked as resolved.

Sign in to view

fix chromium path used by integration tests

df27619

fix chromium version used

524f87e

pirate force-pushed the parallel-ci-tests branch from 3668b1f to 524f87e Compare January 27, 2026 02:01

pirate added 10 commits January 26, 2026 18:08

bump test timeouts to improve flakyness when testing against remote

8ef22d1

fix integration test failures

538081b

fix flushing of server test output

a2c8d78

allow playwright downloads in server integration tests

3002cb7

increase screenshot timeout on prod bb browser

80e6e8a

force disableAPI true in all e2e tests

0efeb40

rename TEST_ENV to STAGEHAND_ENV

da42c9b

up timetouts in screenshot test to 5s

e5a62fc

improve env reporter in tests

ad422cb

lint

a354cee

pirate changed the title ~~Massively speed up PR Github Actions checks by running vitests and e2e tests in parallel instead of in one long-running job~~ Speed up PR Github Actions checks, add code coverage, fix e2e bb tests running not running in bb mode Jan 27, 2026

pirate added 6 commits January 27, 2026 11:18

fix start integration tests not using CHROME_PATH

d60f86a

use non-pretty logs for SEA in CI

bc24428

lint

1fbacd6

fix init script tests being broken by tsx-loader changing

ec4af40

force stagehand-server to use github actions chromium

742e76b

fix init script and click test flakyness

8854ee3

This comment was marked as resolved.

Sign in to view

pirate added 7 commits January 27, 2026 14:27

tweak screenshot timeout in CI

821d4d9

fix cubic comment about testing init script args

7bfce55

fix eslint

c3c0069

enable sourcemaps and coverage for SEA binaries

ead4590

fix eval task result logging

6513e88

add c8 deve dependency

80fad7e

fix rename

3819ec6

pirate changed the title ~~Speed up PR Github Actions checks, add code coverage, fix e2e bb tests running not running in bb mode~~ [STG-1232] Speed up PR Github Actions checks, add code coverage, fix e2e bb tests running not running in bb mode Jan 28, 2026

monadoid approved these changes Jan 29, 2026

View reviewed changes

seanmcguire12 reviewed Jan 29, 2026

View reviewed changes

Conversation

pirate commented Jan 22, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

why

what changed

Summary by cubic

Uh oh!

changeset-bot bot commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

greptile-apps bot commented Jan 22, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pirate commented Jan 22, 2026 •

edited by cubic-dev-ai bot

Loading

changeset-bot bot commented Jan 22, 2026 •

edited

Loading