Skip to content

runtime: surface transient /verify timeouts as retryable 502 availability failures#29

Merged
GsCommand merged 1 commit intomainfrom
codex/update-verify-behavior-for-502-responses
Mar 20, 2026
Merged

runtime: surface transient /verify timeouts as retryable 502 availability failures#29
GsCommand merged 1 commit intomainfrom
codex/update-verify-behavior-for-502-responses

Conversation

@GsCommand
Copy link
Copy Markdown
Contributor

Motivation

  • /verify requests that time out were being reported as generic application failures that looked like cryptographic proof failures, causing callers to treat valid receipts as invalid.
  • The runtime must expose availability vs cryptographic failures so frontends can show retryable messages and not mark receipts invalid.
  • This repo contains the runtime /verify surface (no browser frontend), so the minimal change is to the API response and logging so clients can distinguish transient failures.

Description

  • Change /verify timeout handling in server.mjs so when verify times out (error verify_timeout) the server returns HTTP 502 with failure_type: "availability", retryable: true, reason: "verify_service_unavailable", and a user-facing message explaining the retry recommendation, and sets proof checks fields to null.
  • Add a transient availability log entry with context (want_ens, want_schema, refresh, strict_kid, verb, signer_id) to help operators diagnose ENS/schema/ proxy or process stalls.
  • Preserve existing semantics for cryptographic failures and schema errors: signature/hash/schema failures still produce the canonical verification shape and status codes.
  • Add a unit/integration test in runtime/tests/runtime-signing.test.mjs that simulates a stalled ENS/RPC path and asserts the 502 availability response and new fields.
  • Document the new behavior in README.md and docs/OPERATIONS.md so clients/operators treat 502 as transient availability (not proof invalidation) and know likely runtime causes to inspect.

Testing

  • Ran the repo test suite with npm test and it completed successfully (all tests passed, including smoke checks).
  • Ran the targeted runtime tests with node --test runtime/tests/runtime-signing.test.mjs and the new availability test passed.
  • The new test verifies that a stalled ENS/RPC path with ens=1&refresh=1 plus a short VERIFY_MAX_MS results in a 502 response, ok: false, failure_type: "availability", retryable: true, and nullified proof checks.

Codex Task

Why: verify timeouts must surface as retryable service outages so callers do not mislabel valid receipts as invalid.
Contract impact: none
@GsCommand GsCommand merged commit b18a699 into main Mar 20, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant