test(e2e-prod): per-suite correctness sweeps + nits by jiashuoz · Pull Request #169 · Mnexa-AI/e2a

jiashuoz · 2026-05-27T05:07:12Z

Summary

Companion to #168 (the harness + critical assertion hardening). This is the broader correctness sweep — smaller issues that aren't critical-class but each undermines a specific test's ability to verify what its name claims.

Per-suite changes

Suite	Issue	Fix
01-basic	unused `before` import	removed
02-authz	"cross-tenant DELETE" had dead code — the `assert.ok` above already threw on non-4xx, so the explicit 200/204 `fail()` was unreachable	reordered: hard-fail on 200/204 first via `assert.fail()`, then 4xx assertion
08-mcp / 12-mcp-extended	hardcoded `/Users/joshzhang/Desktop/e2a/mcp/dist/index.js` broke for any other contributor / CI	repo-relative default via `new URL(...).pathname` + `E2A_MCP_DIST` env override
09-postfix	"/send 429 Retry-After" was a no-op recording an `info()` skip but counted as a passing test, inflating the green count	`test.skip()` so it shows correctly in the reporter
11-messaging	reply-empty test pulled from user-scoped `/messages` with no inbound filter — usually picked outbound → 404 on `/reply` (which only operates on inbound) → `>= 400` assert passed without ever testing "empty body returns 400"	now uses agent-scoped inbound listing + asserts `status === 400` exactly; 404-on-listed-id surfaces as info() (would be a listing/storage skew bug, not the thing we're trying to test)
13-billing-surface	usage-vs-list drift tolerance was purely informational at any size	±1 is silently fine (timing race); >1 records info() with detail; >3 hard-fails as counter staleness

Why two PRs

PR #168 is the high-leverage harness + critical-class assertions — narrow and easy to land. This PR is broader (touches 7 files) and worth reviewing independently.

Test plan

Smoke against prod with all changes — clean
Once both PRs are merged, run the full 13-suite e2e against prod and confirm everything stays green (any new failure would be a real regression the previous suite was hiding)

Companion to PR #168 (the harness + contract-assertion hardening). This is the broader correctness sweep from the same review: smaller issues that aren't critical-class but undermine specific tests' ability to actually verify what their names claim. Per-suite fixes: 01-basic.test.ts - Remove unused `before` import. 02-authz.test.ts - "cross-tenant DELETE" had dead code: the assert.ok above already throws on non-4xx, so the explicit 200/204 check could never run. Reordered: hard-fail on 200/204 first (and assert.fail() so node:test sees the regression), then the 4xx ok-range assertion. 08-mcp.test.ts + 12-mcp-extended.test.ts - Replace hardcoded absolute path to mcp/dist/index.js with a repo-relative default + E2A_MCP_DIST env override. The hardcoded path was `/Users/joshzhang/Desktop/e2a/...` — broke for any other contributor or CI environment. 09-postfix.test.ts - The "/send 429 Retry-After" test was a pure no-op recording an `info()` skip — but it counted as a passing test in the totals and inflated the green count. Marked test.skip() so it shows up correctly in the spec-reporter output. The reason (avoiding fan- out of 60+ HITL notification emails) is preserved in the comment. 11-messaging.test.ts (reply-empty) - Previous candidate selection (`/api/v1/messages` user-scoped + fallback to any direction) routinely picked an outbound message, which 404'd at the /reply endpoint (reply only operates on inbound). The 404 satisfied the `>= 400 && < 500` assert so the test passed without ever testing the "empty body returns 400" branch. Now uses the agent-scoped inbound listing and asserts the exact 400; if a listing-vs-storage skew causes a 404 on an id we just listed, that surfaces as info() rather than masking the empty-body check. 13-billing-surface.test.ts - usage-agents drift tolerance was purely informational at any size. Now: ±1 is silently fine (timing race), >1 records info() with detail, but >3 hard-fails as a real counter-staleness regression. Smoke against prod passes with all changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

jiashuoz merged commit 5269e1d into main May 28, 2026
10 checks passed

jiashuoz deleted the test/e2e-prod-correctness-sweeps branch May 28, 2026 00:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(e2e-prod): per-suite correctness sweeps + nits#169

test(e2e-prod): per-suite correctness sweeps + nits#169
jiashuoz merged 1 commit into
mainfrom
test/e2e-prod-correctness-sweeps

jiashuoz commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jiashuoz commented May 27, 2026

Summary

Per-suite changes

Why two PRs

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant