Skip to content

[STG-1232] Speed up PR Github Actions checks, add code coverage, fix e2e bb tests running not running in bb mode#1600

Open
pirate wants to merge 41 commits intomainfrom
parallel-ci-tests
Open

[STG-1232] Speed up PR Github Actions checks, add code coverage, fix e2e bb tests running not running in bb mode#1600
pirate wants to merge 41 commits intomainfrom
parallel-ci-tests

Conversation

@pirate
Copy link
Member

@pirate pirate commented Jan 22, 2026

why

  • unit tests, e2e integration tests, and evals were taking 10~15 minutes per pr
  • finding the specific failing eval or test was tedious and required reading tons of log output
  • we weren't leveraging pnpm or turbo caches or github action's native chromium install
  • tests were flaky because CDP would randomly fail to connect when github actions runner was overloaded

what changed

  • parallelized all CI tests and broke out evals and tests into individual matrix jobs
  • created unified node+pnpm+turbo setup action that all other actions call
  • verified vanilla chromium is able to launch inside all the e2e local jobs before running any of our own code to catch resource contention flakyness
Screenshot 2026-01-26 at 1 04 04 PM

Summary by cubic

Run Vitest and Playwright E2E tests for packages/core in parallel by discovering each file and running them as matrix jobs. Evals reuse the shared build and run after Browserbase E2E to avoid contention, with CI-enforced timeouts for LLM and Browserbase to keep runs fast, and publish CTRF reports and V8 coverage to Insights.

  • Refactors
    • Added discover-core-tests and core-unit-tests matrix to run packages/core/tests/*.test.ts with Vitest; fail-fast on.
    • Added discover-e2e-tests and per-spec matrix for v3 Playwright (local and Browserbase); gated on core changes; CI selects a Browserbase region from an env distribution and passes it to tests and evals; local runs verify Chromium/CDP before tests and use the runner’s native Chromium via CHROME_PATH; Playwright reporters emit JUnit when CTRF_JUNIT_PATH is set.
    • Introduced a single Lint & Build job that uploads shared artifacts with Turbo caching; unit/e2e/evals download and reuse them; prepare script skips build in CI; regression evals fail CI if score < 90%.
    • Made timeouts and parallelism configurable via env (LLM_MAX_MS, BROWSERBASE_CDP_CONNECT_MAX_MS, BROWSERBASE_SESSION_CREATE_MAX_MS, LOCAL_SESSION_LIMIT_PER_E2E_TEST, BROWSERBASE_SESSION_LIMIT_PER_E2E_TEST, EVAL_MAX_CONCURRENCY, EVAL_TRIAL_COUNT, EVAL_AGENT_MAX_CONCURRENCY, EVAL_AGENT_TRIAL_COUNT, CHROME_PATH); enforced LLM call and Browserbase connect/session timeouts; reduced a flaky test’s iterations for stability.
    • Explicitly release Browserbase sessions at the end of tests and evals to free capacity.
    • Added composite actions for Node+pnpm+Turbo caching, Chromium launch verification, weighted Browserbase region selection, CTRF publishing (JUnit→CTRF and eval CTRF), and V8 coverage upload.
    • Renamed TEST_ENV to STAGEHAND_ENV across E2E configs to ensure Browserbase specs run in BB mode; added an env reporter to surface test settings in CI logs.

Written for commit 80fad7e. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link

changeset-bot bot commented Jan 22, 2026

⚠️ No Changeset found

Latest commit: 3819ec6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Jan 22, 2026

Greptile Summary

Replaced monolithic vitest execution with parallel test runs by introducing a matrix strategy that discovers test files dynamically and runs each *.test.ts file in a separate CI job. This change splits the previous single Run Vitest step into two new jobs: discover-core-tests (discovers all test files and creates a matrix) and core-unit-tests (runs each test file in parallel using the matrix). The implementation uses shell scripting to build a JSON array of test files and leverages GitHub Actions' fromJson() to create parallel jobs with fail-fast: false.

  • Parallelizes 24 unit test files across separate GitHub Actions runners
  • Maintains proper dependency chain by requiring run-build to complete before tests
  • Downloads build artifacts to each test runner to ensure tests use the latest build
  • Preserves test output visibility by naming each job core/${{ matrix.test.name }}

Confidence Score: 5/5

  • Safe to merge - straightforward CI parallelization with proper job dependencies
  • The change is well-structured and follows GitHub Actions best practices. The matrix strategy correctly handles test file discovery, maintains proper job dependencies through needs:, and preserves the build artifact flow. The fail-fast: false setting ensures all tests run even if some fail, providing complete test coverage feedback. No logic changes to test code itself.
  • No files require special attention

Important Files Changed

Filename Overview
.github/workflows/ci.yml Replaced single vitest job with parallel test execution using GitHub Actions matrix strategy to run 24 test files concurrently

Sequence Diagram

sequenceDiagram
    participant DC as determine-changes
    participant DB as run-build
    participant DT as discover-core-tests
    participant CU as core-unit-tests
    participant E2E as run-e2e-*-tests
    
    DC->>DB: core == 'true'
    DC->>DT: core == 'true'
    
    Note over DT: Find all *.test.ts files<br/>in packages/core/tests
    DT->>DT: Build JSON matrix array<br/>of test files
    DT->>DT: Set outputs:<br/>core-tests, has-core-tests
    
    DB->>DB: Build packages
    DB->>DB: Upload build artifacts
    
    DT-->>CU: has-core-tests == 'true'
    DB-->>CU: Provides build artifacts
    
    Note over CU: Matrix strategy creates<br/>parallel jobs for each test
    
    par Test File 1
        CU->>CU: Setup environment
        CU->>CU: Download build artifacts
        CU->>CU: vitest run test1.test.ts
    and Test File 2
        CU->>CU: Setup environment
        CU->>CU: Download build artifacts
        CU->>CU: vitest run test2.test.ts
    and Test File N
        CU->>CU: Setup environment
        CU->>CU: Download build artifacts
        CU->>CU: vitest run testN.test.ts
    end
    
    CU-->>E2E: All tests complete
    
    Note over E2E: E2E tests run after<br/>build completes
Loading

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 1 file

Confidence score: 5/5

  • Automated review surfaced no issues in the provided summaries.
  • No files require special attention.
Architecture diagram
sequenceDiagram
    participant GH as GitHub Actions
    participant Disc as Job: discover-core-tests
    participant Worker as Job: core-unit-tests (Matrix)
    participant Store as Artifacts & Cache

    Note over GH,Store: CHANGED: Parallelization Strategy

    GH->>Disc: Trigger if core changed
    
    Note right of Disc: Shell Script Execution
    Disc->>Disc: NEW: find ./tests -name "*.test.ts"
    Disc->>Disc: NEW: Construct JSON array [{path, name}...]
    Disc-->>GH: Output: core-tests JSON

    GH->>GH: Parse JSON & Expand Matrix
    
    par NEW: Parallel Execution
        GH->>Worker: Spawn Worker 1 (Test A)
        GH->>Worker: Spawn Worker N (Test B)
    end

    Note right of Worker: Runs concurrently per file
    
    loop Setup & Run
        Worker->>Store: Restore pnpm store (Cache)
        Worker->>Store: Download 'build-artifacts'
        Store-->>Worker: lib/ files
        
        Worker->>Worker: NEW: vitest run [matrix.test.path]
        
        alt Test Fails
            Worker-->>GH: Mark job failed
            Note over Worker,GH: fail-fast: false (Other workers continue)
        else Test Passes
            Worker-->>GH: Mark job success
        end
    end
Loading

@pirate pirate changed the title run vitests in parallel in ci Massively speed up PR Github Actions checks by running vitests and e2e tests in parallel instead of in one long-running job Jan 22, 2026
cubic-dev-ai[bot]

This comment was marked as resolved.

cubic-dev-ai[bot]

This comment was marked as resolved.

cubic-dev-ai[bot]

This comment was marked as resolved.

cubic-dev-ai[bot]

This comment was marked as resolved.

cubic-dev-ai[bot]

This comment was marked as resolved.

cubic-dev-ai[bot]

This comment was marked as resolved.

cubic-dev-ai[bot]

This comment was marked as resolved.

cubic-dev-ai[bot]

This comment was marked as resolved.

cubic-dev-ai[bot]

This comment was marked as resolved.

@pirate pirate force-pushed the parallel-ci-tests branch from 388a1e6 to 6c906c8 Compare January 26, 2026 21:10
cubic-dev-ai[bot]

This comment was marked as resolved.

@pirate pirate force-pushed the parallel-ci-tests branch from 3668b1f to 524f87e Compare January 27, 2026 02:01
@pirate pirate changed the title Massively speed up PR Github Actions checks by running vitests and e2e tests in parallel instead of in one long-running job Speed up PR Github Actions checks, add code coverage, fix e2e bb tests running not running in bb mode Jan 27, 2026
cubic-dev-ai[bot]

This comment was marked as resolved.

@pirate pirate changed the title Speed up PR Github Actions checks, add code coverage, fix e2e bb tests running not running in bb mode [STG-1232] Speed up PR Github Actions checks, add code coverage, fix e2e bb tests running not running in bb mode Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants