Skip to content

test(e2e-prod): harden harness redirects + 8 contract-check assertions#168

Merged
jiashuoz merged 1 commit into
mainfrom
test/e2e-prod-harness-hardening
May 28, 2026
Merged

test(e2e-prod): harden harness redirects + 8 contract-check assertions#168
jiashuoz merged 1 commit into
mainfrom
test/e2e-prod-harness-hardening

Conversation

@jiashuoz
Copy link
Copy Markdown
Member

Summary

Independent reviewer found two correctness gaps in the e2e-prod harness that let critical-class regressions silently pass node:test. Both addressed here.

A — `fetch`'s default `redirect: "follow"` defeats CSRF checks

The `/api/billing/checkout` and `/portal` CSRF tests assert "GET must return 4xx," but if the sidecar ever 302'd to Stripe, fetch transparently followed and the test asserted against Stripe's response. Now: `redirect: "manual"` on every request — any 3xx is a real signal the test must see.

B — `fail() + return` silently swallowed contract violations

The harness's `fail()` writes severity to the JSON report but doesn't throw. Used correctly for soft observations; used wrongly (with `+ return`) at hard-contract-check sites, it degraded to "green node:test + buried fail in JSON." Eight such sites identified:

Suite Site Why it matters
13-billing-surface GET /checkout, GET /portal CSRF bypass goes silent
13-billing-surface POST /checkout without auth auth-gate bypass goes silent
11-messaging pagination-duplicate-ids data integrity regression goes silent
11-messaging idem-key-not-replayed financial-stakes double-send regression goes silent
10-domains register-non-201 primary CRUD entry regression goes silent
03-concurrency parallel-delete accepted all-4 succeeding without flagging design assumptions
06-reliability 3× WS auth tests auth-gate bypass goes silent
02-authz 401-leak oracle side-channel goes silent

Pattern at each site: keep the existing `fail()` call (so the JSON report captures triage detail), then `assert.fail()` immediately after so node:test marks red. The two work in tandem.

Notable additions beyond the bare `assert.fail`

  • 03-concurrency parallel DELETE: test renamed from "one succeeds, rest 4xx" to "is idempotent under contention (no 5xx)" — the previous name was a lie. Added a GET-after-delete check that the agent is actually gone regardless of which design (first-writer-wins vs idempotent) the server chose.
  • 11-messaging idem-key-not-replayed: added cleanup of both message_ids before the throw so HITL queue doesn't leak when the test fails.
  • 02-authz 401 oracle: now hard-asserts byte-identical 401 bodies between malformed-shape and bogus-shape keys.

Test plan

  • `node smoke.ts` against prod with new `redirect: "manual"` — clean
  • After merge: full suite against prod — every test should still pass green (the only fails would be real regressions the previous suite was hiding)

Independent reviewer found two correctness gaps in the e2e-prod
harness that let critical-class regressions silently pass node:test:

A) fetch's default redirect: "follow" defeats CSRF discipline checks.
   The /api/billing/checkout + /portal CSRF tests assert "GET must
   return 4xx," but if the sidecar ever started 302-ing to Stripe,
   fetch transparently followed and the test asserted against
   Stripe's response instead of ours. Set redirect: "manual" on every
   request so any 3xx is a real signal the test must see.

   Side-benefit: catches any future API endpoint that grows an
   unexpected redirect — those are almost always a sign of either
   broken routing or unsafe cross-origin behavior.

B) The harness has a fail() helper that writes a severity entry to
   the JSON report but does NOT throw. Used correctly for soft
   observations ("here's an interesting edge case"); used with `+
   return` for what's meant to be a contract check, it silently
   degrades to "green node:test dot + buried fail in JSON."

   Eight tests had this anti-pattern at hard-contract-check sites:

   - 13-billing-surface: GET /checkout, GET /portal, POST /checkout
     without auth (3 sites) — CSRF + auth-gate breaks would pass
   - 11-messaging: pagination-duplicate-ids (silent data integrity
     regression) + idem-key-not-replayed (silent double-send risk
     on financial-stakes Idempotency-Key replay)
   - 10-domains: register-non-201 (primary CRUD entry — a 500 on
     register would pass)
   - 03-concurrency: parallel-delete only asserted ok.length >= 1
     (renamed to match what the assertion actually verifies, plus
     added a GET-after-delete check that the agent is actually gone)
   - 06-reliability: 3 WS auth tests (no key / wrong key /
     cross-tenant) — auth-gate bypass would pass
   - 02-authz: 401-leak oracle was purely info(); a real divergence
     between malformed vs well-shaped-bogus-key bodies is a side-
     channel that should hard-fail

   Pattern at each site: keep the existing fail() call (so the JSON
   report still records it), then assert.fail() immediately after so
   node:test marks the test red. The two work in tandem — the report
   captures detail for triage, the assertion gates CI.

Smoke against prod confirms redirect:manual doesn't break existing
flows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jiashuoz jiashuoz merged commit c7a12b2 into main May 28, 2026
10 checks passed
@jiashuoz jiashuoz deleted the test/e2e-prod-harness-hardening branch May 28, 2026 00:55
jiashuoz added a commit that referenced this pull request May 28, 2026
Companion to PR #168 (the harness + contract-assertion hardening).
This is the broader correctness sweep from the same review: smaller
issues that aren't critical-class but undermine specific tests'
ability to actually verify what their names claim.

Per-suite fixes:

01-basic.test.ts
 - Remove unused `before` import.

02-authz.test.ts
 - "cross-tenant DELETE" had dead code: the assert.ok above already
   throws on non-4xx, so the explicit 200/204 check could never run.
   Reordered: hard-fail on 200/204 first (and assert.fail() so node:test
   sees the regression), then the 4xx ok-range assertion.

08-mcp.test.ts + 12-mcp-extended.test.ts
 - Replace hardcoded absolute path to mcp/dist/index.js with a
   repo-relative default + E2A_MCP_DIST env override. The hardcoded
   path was `/Users/joshzhang/Desktop/e2a/...` — broke for any other
   contributor or CI environment.

09-postfix.test.ts
 - The "/send 429 Retry-After" test was a pure no-op recording an
   `info()` skip — but it counted as a passing test in the totals
   and inflated the green count. Marked test.skip() so it shows up
   correctly in the spec-reporter output. The reason (avoiding fan-
   out of 60+ HITL notification emails) is preserved in the comment.

11-messaging.test.ts (reply-empty)
 - Previous candidate selection (`/api/v1/messages` user-scoped +
   fallback to any direction) routinely picked an outbound message,
   which 404'd at the /reply endpoint (reply only operates on inbound).
   The 404 satisfied the `>= 400 && < 500` assert so the test passed
   without ever testing the "empty body returns 400" branch. Now uses
   the agent-scoped inbound listing and asserts the exact 400; if a
   listing-vs-storage skew causes a 404 on an id we just listed, that
   surfaces as info() rather than masking the empty-body check.

13-billing-surface.test.ts
 - usage-agents drift tolerance was purely informational at any size.
   Now: ±1 is silently fine (timing race), >1 records info() with
   detail, but >3 hard-fails as a real counter-staleness regression.

Smoke against prod passes with all changes.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant