Skip to content

feat: opt-in parallel test execution (N=2) — experimental#6

Draft
kevinccbsg wants to merge 8 commits into
mainfrom
feat/parallel-execution
Draft

feat: opt-in parallel test execution (N=2) — experimental#6
kevinccbsg wants to merge 8 commits into
mainfrom
feat/parallel-execution

Conversation

@kevinccbsg
Copy link
Copy Markdown
Member

Summary

Adds an opt-in parallel: true flag to twd.config.json that runs the TWD suite across two isolated Puppeteer browser contexts in parallel. Measured ~1.8× wallclock speedup on a 60-test suite (63.7 s → 34.8 s) on a developer laptop. Zero behavior change for users who don't set the flag.

This ships as experimental / beta — worker count is fixed at 2 for now; higher counts are gated on a future twd-js change that makes test IDs deterministic.

How to use

{
  "url": "http://localhost:5173",
  "parallel": true,        // ← opt-in, default false
  "retryCount": 2          // existing; honored per worker
}

All existing config (coverage, contracts, timeout, puppeteerArgs, headless, retryCount) works unchanged.

What changes for existing users

Nothing. Without parallel: true:

  • runTests() skips the new branch and runs the existing serial body unchanged.
  • The serial code itself was not refactored.
  • All 9 pre-existing runTests.test.js tests pass unchanged; a new test explicitly asserts the serial path is selected when the flag is absent and that anti-throttle flags are NOT in the launch args.

What's inside

New:

  • src/runParallel.js — orchestrates the parallel flow (~200 LoC): launch → 2 browser contexts → Promise.all of runByIds self-filtered by idx % N === workerIndex → per-worker coverage dump → merged mock validation → per-worker result trees + combined summary.
  • src/mergeMocks.js — pure utility that combines per-worker mock maps with worker-index-prefixed keys (defense-in-depth against twd-js random-ID collisions).
  • tests/runParallel.test.js — 15 unit tests covering launch args, anti-throttle flags, navigation, retry pass-through, pass/fail aggregation, coverage writes (including on failures), .nyc_output cleanup, contract mock exposure, merged-mock validation, and contract error propagation.
  • tests/mergeMocks.test.js — 5 unit tests for the merge utility.

Modified:

  • src/config.jsparallel: false added to DEFAULT_CONFIG.
  • src/index.js — early-return branch: if (config.parallel) return runParallel(...). Serial path below unchanged.
  • tests/runTests.test.js — 2 new tests asserting branch selection.
  • README.md — documents the new field with expected speedup and current limitations.

Key design decisions

  1. Self-filter by index, not probe + distribute. Initial design probed one context, split IDs in Node, and passed chunks to each worker. That failed because twd-js generates test IDs via Math.random() — IDs from one context don't exist in another, so runByIds silently matched zero tests in workers ≥1. Each worker now enumerates its own __TWD_STATE__.handlers and takes the slots where idx % N === workerIndex. Registration order is stable across contexts; IDs are not.

  2. Anti-throttle flags appended automatically. --disable-background-timer-throttling, --disable-renderer-backgrounding, --disable-backgrounding-occluded-windows are added to puppeteerArgs unless the user already set them. Measured to materially reduce waitFor timeouts at N≥2.

  3. Coverage always dumps, even on failures. The serial path skips coverage on test failures; parallel always dumps so one worker's flake doesn't blind the other's data. Standard nyc report merges the files automatically.

  4. Per-worker reports (not unified). Each worker's results render with reportResults independently, followed by a combined summary. Unified reporting needs canonical test identity across contexts, which is blocked on deterministic IDs in twd-js — tracked as a follow-up.

POC evidence

The approach was validated in a throwaway POC at poc/parallel/:

  • (a) SW isolation: 60/60 tests pass at N=2 on a real app with overlapping mocks across contexts — no cross-contamination.
  • (b) Coverage split: .nyc_output/out-0.json + out-1.json merge cleanly via nyc report at 85.91% statements — identical to serial baseline.
  • (c) Full suite completion: 60/60 with a 1.83× wallclock speedup (63.7 s serial → 34.8 s parallel).

See poc/parallel/README.md for the full findings, including the discovery of the random-ID issue and the concurrency ceiling at N≥3.

Known limitations / follow-ups

  • N is fixed at 2. Configurable workers: N is gated on deterministic test IDs in twd-js.
  • Per-worker trees, not one unified tree. Also gated on deterministic IDs.
  • At N=3+, the 1-second waitFor default gets tight under CPU contention. Retries (which are fully supported) absorb most of this; a per-test timeout override is a future improvement.
  • test-example-app is not instrumented with istanbul, so the manual smoke logs no __coverage__ on window — the coverage code path is fully unit-tested. Real coverage behavior was verified against an externally instrumented app in the POC.

Test plan

  • Unit tests: 199 passed (199) across 10 test files (npx vitest run)
  • Manual smoke against test-example-app with parallel: true: 71/71 pass, 67 mocks validated, contract report written to .twd/contract-report.md
  • Manual smoke against test-example-app without parallel: identical output to previous release — no behavior change
  • Try on the actual target app and compare wallclock to serial baseline in CI
  • Confirm npx nyc report on merged .nyc_output/out-<i>.json files works on CI runner
  • Decide on beta release tagging strategy (pre-release version vs. direct 1.2.0)

Related docs (in-tree)

  • Spec: docs/superpowers/specs/2026-04-21-parallel-test-execution-production-design.md
  • Plan: docs/superpowers/plans/2026-04-22-parallel-test-execution-production.md
  • POC spec/plan/findings: docs/superpowers/specs/2026-04-21-parallel-test-execution-poc-design.md, docs/superpowers/plans/2026-04-21-parallel-test-execution-poc.md, poc/parallel/README.md

🤖 Generated with Claude Code

kevinccbsg and others added 8 commits April 22, 2026 09:03
Proves three POC criteria against the holafly/web-checkout app:

- Service-worker isolation works across browser.createBrowserContext();
  60/60 tests pass at N=2 with overlapping mocks and zero contamination.
- Per-worker __coverage__ dumps to .nyc_output/out-<i>.json merge cleanly
  via nyc report (85.91% statements, identical to serial baseline).
- Full suite completes with a 1.83× wallclock speedup (63.7s → 34.8s).

Includes anti-throttle Chromium flags, which reduced flakiness at
moderate N values. Documents findings and the random-test-ID discovery
that forced a self-filter-by-index pivot during implementation.

Throwaway script; production feature will be implemented under src/ in
a follow-up commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Specifies the opt-in \`parallel: true\` config flag and the supporting
architecture: new src/runParallel.js module, src/mergeMocks.js utility,
and a thin branch in src/index.js. Serial path remains byte-identical
when the flag is absent or false. N=2 is hardcoded in this release;
deterministic test IDs, unified reporting, and configurable worker
counts are documented follow-ups.

The plan decomposes the work into six TDD-style tasks (config default,
mergeMocks utility, runParallel core, contract handling, index.js
branch, manual smoke + README).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a \`parallel\` boolean field to the config schema with a default of
false. No behavior change — subsequent commits wire the flag into a new
runParallel module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Worker-index-prefixed keys prevent silent collisions if two browser
contexts happen to generate the same twd-js random testId. Each
merged mock carries its workerIndex field for downstream testName
resolution.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduces src/runParallel.js. Launches one Puppeteer browser with
anti-throttle flags, creates two isolated browser contexts, navigates
each to the configured URL, runs a test chunk via runByIds (self-filtered
by idx % N === workerIndex inside page.evaluate), dumps per-worker
window.__coverage__ to .nyc_output/out-<i>.json, and aggregates pass/
fail/skip counts. Contract mock collection and merging land in the next
commit. Not yet wired into src/index.js.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each worker exposes its own __twdCollectMock via page.exposeFunction,
writing into a per-worker Map. After both workers complete, mergeMocks
combines them with worker-indexed keys. Each mock carries workerIndex
so buildTestPath can pick the correct handler tree for testName
resolution. Validation and markdown reporting reuse the existing
serial pipeline unchanged.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an early-return branch at the top of runTests(): if
config.parallel is truthy, load contract validators and hand off to
runParallel. Serial code path below is textually unchanged and runs
when parallel is absent or false.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds the `parallel` boolean to the Configuration Options table and a
new section covering how the feature works, the 1.8× measured speedup,
and current limitations (N=2 fixed, per-worker reporting, CI tuning).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

TWD Contract Validation

Spec Passed Failed Warnings Mode
./contracts/users-3.0.json 2 3 1 warn
./contracts/posts-3.1.json 2 2 0 warn
./contracts/products-3.0.json 13 23 2 warn
./contracts/events-3.1.json 6 13 0 warn

23 passed · 41 failed · 3 warnings · 1 skipped

Failed validations

./contracts/users-3.0.json

  • GET /users/{userId} (200) — mock getUserNoAddress — in "Contract Validation - Mismatches > should fail: missing nested address field"
    • response.address: missing required property "address"
  • GET /users/{userId} (200) — mock getUserBadAddress — in "Contract Validation - Mismatches > should fail: nested address missing required city"
    • response.address.city: missing required property "city"
    • response.address.country: missing required property "country"
  • GET /users/{userId} (200) — mock getUserBadRole — in "Contract Validation - Mismatches > should fail: oneOf role with invalid variant"
    • response.role: oneOf best match (branch 2 of 2) failed: must be one of: "viewer"

./contracts/posts-3.1.json

  • GET /posts/{postId} (200) — mock getPostNoAuthor — in "Contract Validation - Mismatches > should fail: post missing nested author object"
    • response.author: missing required property "author"
  • GET /posts/{postId} (200) — mock getPostBadMeta — in "Contract Validation - Mismatches > should fail: post oneOf metadata matches neither variant"
    • response.metadata: oneOf best match (branch 1 of 2) failed: missing required property "category", unexpected property "duration", must be one of: "article"

./contracts/products-3.0.json

  • GET /products (200) — mock getProductEmptyName — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: empty name violates minLength"
    • response[0].name: must NOT have fewer than 1 characters
  • GET /products (200) — mock getProductBadSku — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: invalid SKU pattern"
    • response[0].sku: must match pattern "^[A-Z]{2,4}-\d{4,8}$"
  • GET /products (200) — mock getProductBadUuid — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: invalid uuid format for id"
    • response[0].id: must match format "uuid"
  • GET /products (200) — mock getProductBadDateTime — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: invalid date-time format"
    • response[0].createdAt: must match format "date-time"
  • GET /products (200) — mock getProductBadDate — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: invalid date format"
    • response[0].releaseDate: must match format "date"
  • GET /products (200) — mock getProductBadEmail — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: invalid email format"
    • response[0].contactEmail: must match format "email"
  • GET /products (200) — mock getProductBadUri — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: invalid uri format"
    • response[0].website: must match format "uri"
  • GET /products (200) — mock getProductBadIp — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: invalid ipv4 format"
    • response[0].serverIp: must match format "ipv4"
  • GET /products (200) — mock getProductBadIpV6 — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: invalid ipv6 format"
    • response[0].serverIpV6: must match format "ipv6"
  • GET /products (200) — mock getProductZeroPrice — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: price of 0 violates exclusiveMinimum"
    • response[0].price: must be > 0
  • GET /products (200) — mock getProductNegQty — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: negative quantity violates minimum"
    • response[0].quantity: must be >= 0
  • GET /products (200) — mock getProductOverQty — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: quantity exceeds maximum"
    • response[0].quantity: must be <= 999999
  • GET /products (200) — mock getProductBadWeight — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: weight not multipleOf 0.01"
    • response[0].weight: must be multiple of 0.01
  • GET /products (200) — mock getProductBadRating — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: rating above maximum (5)"
    • response[0].rating: must be <= 5
  • GET /products (200) — mock getProductBadCurrency — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: invalid enum value for currency"
    • response[0].currency: must be one of: "USD", "EUR", "GBP", "JPY"
  • GET /products (200) — mock getProductBadCategory — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: invalid enum value for category"
    • response[0].category: must be one of: "electronics", "clothing", "food", "books", "toys"
  • GET /products (200) — mock getProductBadBool — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: string value for boolean inStock"
    • response[0].inStock: expected boolean, got string
  • GET /products (200) — mock getProductDupTags — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: duplicate tags violates uniqueItems"
    • response[0].tags: must NOT have duplicate items (items ## 1 and 0 are identical)
  • GET /products (200) — mock getProductTooManyTags — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: tags exceeds maxItems (10)"
    • response[0].tags: must NOT have more than 10 items
  • GET /products (200) — mock getProductBadMeta — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: non-string value in metadata additionalProperties"
    • response[0].metadata.count: expected string, got number
  • GET /settings (200) — mock getSettingsBadExtra — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: extra property on Settings (additionalProperties: false)"
    • response.extraField: unexpected property "extraField"
  • GET /settings (200) — mock getSettingsBadLang — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: invalid language pattern in Settings"
    • response.language: must match pattern "^[a-z]{2}(-[A-Z]{2})?$"
  • GET /products (200) — mock getProductBadNullable — in "Contract Validation - Products Mismatches (OpenAPI 3.0 — error mode) > should fail: wrong type for nullable description (number instead of string|null)"
    • response[0].description: expected string,null, got number

./contracts/events-3.1.json

  • GET /events (200) — mock getEventsEmpty — in "Contract Validation - Events Mismatches (OpenAPI 3.1 — error mode) > should fail: empty events array violates minItems (1)"
    • response: must NOT have fewer than 1 items
  • GET /events (200) — mock getEventShortName — in "Contract Validation - Events Mismatches (OpenAPI 3.1 — error mode) > should fail: event name too short (minLength: 3)"
    • response[0].name: must NOT have fewer than 3 characters
  • GET /events (200) — mock getEventBadDate — in "Contract Validation - Events Mismatches (OpenAPI 3.1 — error mode) > should fail: invalid date-time format for startDate"
    • response[0].startDate: must match format "date-time"
  • GET /events (200) — mock getEventFloatId — in "Contract Validation - Events Mismatches (OpenAPI 3.1 — error mode) > should fail: float value for integer id"
    • response[0].id: expected integer, got number
    • response[0].id: must match format "int64"
  • GET /events (200) — mock getEventBadBool — in "Contract Validation - Events Mismatches (OpenAPI 3.1 — error mode) > should fail: number value for boolean active"
    • response[0].active: expected boolean, got number
  • GET /events (200) — mock getEventBadStatus — in "Contract Validation - Events Mismatches (OpenAPI 3.1 — error mode) > should fail: invalid enum value for status"
    • response[0].status: must be one of: "draft", "published", "archived"
  • GET /events (200) — mock getEventScoreMax — in "Contract Validation - Events Mismatches (OpenAPI 3.1 — error mode) > should fail: score at exclusiveMaximum boundary (100)"
    • response[0].score: must be < 100
  • GET /events (200) — mock getEventLowPriority — in "Contract Validation - Events Mismatches (OpenAPI 3.1 — error mode) > should fail: priority below minimum (1)"
    • response[0].priority: must be >= 1
  • GET /events (200) — mock getEventHighPriority — in "Contract Validation - Events Mismatches (OpenAPI 3.1 — error mode) > should fail: priority above maximum (5)"
    • response[0].priority: must be <= 5
  • GET /events (200) — mock getEventDupAttendees — in "Contract Validation - Events Mismatches (OpenAPI 3.1 — error mode) > should fail: duplicate attendees violates uniqueItems"
    • response[0].attendees: must NOT have duplicate items (items ## 1 and 0 are identical)
  • GET /events (200) — mock getEventNoAttendees — in "Contract Validation - Events Mismatches (OpenAPI 3.1 — error mode) > should fail: empty attendees array violates minItems (1)"
    • response[0].attendees: must NOT have fewer than 1 items
  • GET /events (200) — mock getEventBadAttendee — in "Contract Validation - Events Mismatches (OpenAPI 3.1 — error mode) > should fail: invalid email format in attendees"
    • response[0].attendees[0]: must match format "email"
  • GET /events/{eventId} (200) — mock getEventBadNullable — in "Contract Validation - Events Mismatches (OpenAPI 3.1 — error mode) > should fail: wrong type for nullable description (number instead of string|null)"
    • response.description: expected string,null, got number

View full report →

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant