Skip to content

test: 5s default timeout flakes across multiple e2e test files #1173

@christso

Description

@christso

Symptom

Multiple integration/e2e tests reliably time out at the 5000 ms default when run as part of the full `bun test` suite under contention. They all share the same root cause: real-e2e operations (subprocess spawns, workspace materialisation, pool acquisition) whose wall-clock under suite load exceeds Bun's 5s default per-test timeout.

Same bug class as #1169 (fixed in #1170 for `pipeline-e2e.test.ts`) and the `pipeline input` block (fixed in #1176). New occurrences keep surfacing as more PRs get pushed — every push attempt this week has hit a different file.

Known offending tests (as of 2026-04-27)

`packages/core/test/evaluation/orchestrator.test.ts` (and related workspace test files):

  • `WorkspacePoolManager > slot acquisition > throws when all slots are locked`
  • `RepoManager > materializeAll > materializes multiple repos`
  • workspace lifecycle tests
  • `--workspace` flag tests

`apps/cli/test/`:

  • `eval.integration.test.ts` — multiple cases
  • `commands/results/serve.test.ts`
  • `pipeline grade — builtin assertions` tests
  • `agentv eval assert > exits 0 when grader returns ...`
  • `agentv eval assert > exits 1 when grader returns ...`
  • `trend command > ...`

This list is non-exhaustive — any test that spawns subprocesses or materialises workspaces is a candidate. Recommend a sweep rather than filing per-file issues.

Why this matters

`validate.yml` (CI) does not run `bun test` — but the local prek pre-push hook does, and it is the only test gate before push. PRs #1167, #1168, #1174, #1175 all required `--no-verify` bypass. PR #1176 needed seven push attempts before catching a contention-free run. That undermines the safety the hook is supposed to provide and trains contributors to ignore it.

Suggested fix

Same one-liner pattern as #1170 / #1176: bump per-test timeout to 30000 ms using Bun's numeric third-arg form:

```ts
it('test name', async () => { ... }, 30_000);
```

(The files import from `'vitest'` but are run by `bun test` — the numeric form works; the vitest options-object does not.)

When all tests in a `describe` block share the same risk profile, prefer setting it once at the describe level.

Approach

Recommend one sweep PR that walks the repo, identifies every test using `execa`, subprocess spawning, or `materializeAll`/`WorkspacePoolManager`, and applies the 30s timeout uniformly. The list above is a starting point; the sweep should grep more broadly:

```bash
grep -rn 'execa|materializeAll|WorkspacePoolManager' apps/cli/test packages/core/test
```

This is preferable to per-file PRs because:

  • Each new flake costs a PR and a contention bypass.
  • The fix is mechanical with no architectural risk.
  • Future contributors stop hitting the bypass-needed pattern.

Handoff context

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions