Skip to content

fix: open-mandate consume-once locking + commit-uncertain raise#2

Merged
amavashev merged 6 commits into
mainfrom
fix/open-mandate-consume-once-and-commit-uncertain
May 13, 2026
Merged

fix: open-mandate consume-once locking + commit-uncertain raise#2
amavashev merged 6 commits into
mainfrom
fix/open-mandate-consume-once-and-commit-uncertain

Conversation

@amavashev
Copy link
Copy Markdown
Contributor

@amavashev amavashev commented May 13, 2026

Follow-up to PR #1 addressing the positioning-review findings. Stays on v0.1.0 (still pre-PyPI).

P0-A — Open-mandate consume-once locking

Earlier rounds keyed idempotency on transaction_id only, which breaks the AP2 spec's normative consume-once defense (§6) against an autonomous agent presenting subsequent open mandates without a rejection receipt: two distinct transactions derived from the same open mandate produced DIFFERENT idempotency keys, so Cycles would authorize both. The defense we claimed didn't actually exist for the human-not-present flow.

Fix: when AP2Mandate.open_mandate_hash is present, key the lock on it instead of transaction_id. Scope namespace embedded in the key so the two buckets never collide server-side:

Mandate carries Key shape
open_mandate_hash (HNP) ap2:open_mandate:{sha256(open_mandate_hash)[:32]}:{phase}[:{suffix}]
only transaction_id (default) ap2:tx:{sha256(transaction_id)[:32]}:{phase}[:{suffix}]

New mapping.consume_once_input() returns the (scope, raw_value) pair. idempotency_key() signature changed from (transaction_id: str, ...) to (mandate: AP2Mandate, ...) for automatic scope selection.

Wire-shape change (idempotency keys now include scope namespace). Pre-PyPI, no migration cost.

Regression coverage: new TestIdempotencyKeyOpenMandateScope class + new test_open_mandate_overuse_collapses_distinct_transactions_onto_one_key in the double-spend file.

P0-B — Post-PSP commit failures now raise

RESERVATION_FINALIZED, RESERVATION_EXPIRED, and IDEMPOTENCY_MISMATCH previously logged at warning level and returned silently. Caller's only signal was guard.committed == False — too easy to miss. RESERVATION_EXPIRED in particular is dangerous: the PSP may have charged while Cycles reclaimed the budget on TTL.

Fix: new AP2GuardCommitUncertain exception (carries error_code + request_id + reservation_id). Raised from _handle_commit for all three terminal-status codes. Still no auto-release (a prior commit may already have settled).

Three existing tests flipped from "asserts silent" to "asserts raise"; new RESERVATION_EXPIRED test added.

Polish (positioning review medium items)

  • README: "Independent project — not affiliated with Google" disclaimer near the top.
  • README: new "What this does NOT do" section (no signature verification, no mandate creation, no PSP, only runtime authority gating).
  • README: tightened headline with bulleted "prevents these failure modes" list (the reviewer's recommended GTM framing).
  • README: softened "any AP2-compatible SDK" → "AP2-style PaymentMandate objects via a small adapter layer; tested against the current AP2 public example shapes".
  • README: lifecycle + idempotency tables updated for both new behaviors.
  • New tests/test_ap2_shape_adapter.py: exercises AP2Mandate.from_ap2() against local dataclasses mirroring the upstream AP2 sample-type layout, so adapter brittleness surfaces here first instead of at integration time.

Test plan

  • ruff check . && ruff format --check . — clean
  • mypy --strict runcycles_ap2 — 0 errors (8 files)
  • pytest --cov=runcycles_ap2 --cov-fail-under=95100 passed, 98.35% coverage (up from 90)
  • python -m build — sdist + wheel build cleanly
  • CI green on this PR before merge
  • Optional: manual run of examples/ap2_human_not_present.py against a local Cycles server

Public API additions

  • AP2GuardCommitUncertain exception class (exported)

Wire-shape change (acceptable: pre-PyPI)

Idempotency keys now include a scope namespace: ap2:tx:... or ap2:open_mandate:... (was ap2:...). No callers in the wild yet — v0.1.0 was merged but never published.

amavashev added 2 commits May 13, 2026 10:23
P0-A — Open-mandate consume-once locking
=======================================
Earlier rounds keyed idempotency on transaction_id only, which breaks the
AP2 spec's normative defense (specification §6) against an autonomous
agent presenting subsequent open mandates without a rejection receipt:
two distinct transactions derived from the same open mandate produced
DIFFERENT idempotency keys, so both would be authorized.

Fix: when AP2Mandate.open_mandate_hash is present, key the lock on
open_mandate_hash instead of transaction_id. Scope namespace embedded in
the key so the two buckets never collide server-side:
  - HNP:     ap2:open_mandate:{sha256(open_mandate_hash)[:32]}:{phase}
  - default: ap2:tx:{sha256(transaction_id)[:32]}:{phase}

New mapping.consume_once_input() picks the (scope, raw_value) pair.
idempotency_key() signature changed from (transaction_id, ...) to
(mandate, ...) for automatic scope selection. Wire-shape change, but
pre-PyPI so no migration cost.

New TestIdempotencyKeyOpenMandateScope class covers:
  - open_mandate scope used when hash present
  - different transactions / same open_mandate share key
  - tx and open_mandate namespaces never collide
  - open_mandate takes precedence over transaction_id
Plus new test_open_mandate_overuse_collapses_distinct_transactions_onto_one_key
in test_double_spend_same_transaction.

P0-B — Post-PSP commit failures now raise
==========================================
RESERVATION_FINALIZED, RESERVATION_EXPIRED, and IDEMPOTENCY_MISMATCH
previously returned silently after a warning log. Caller's only signal
was guard.committed == False — too easy to miss. RESERVATION_EXPIRED in
particular is dangerous: PSP may have charged while Cycles reclaimed
the budget on TTL.

New AP2GuardCommitUncertain exception (carries error_code, request_id,
reservation_id). Raised from _handle_commit for all three terminal-
status codes. Still no auto-release (a prior commit may already have
settled).

Three existing tests flipped from "asserts silent" to "asserts raise":
  - test_commit_idempotency_mismatch_raises_uncertain
  - test_reservation_finalized_raises_uncertain
  - test_reservation_expired_raises_uncertain (new)

Polish
======
- README: "Independent project / not affiliated with Google" disclaimer.
- README: new "What this does NOT do" section (no signature verify,
  no mandate creation, no PSP, etc.).
- README: tightened headline value-prop with bulleted "prevents these
  failure modes" list.
- README: softened "any AP2-compatible SDK" → "AP2-style PaymentMandate
  objects via a small adapter layer; tested against the current AP2
  public example shapes".
- README: lifecycle + idempotency tables updated for both new behaviors.
- New tests/test_ap2_shape_adapter.py: exercises AP2Mandate.from_ap2()
  against local dataclasses mirroring the upstream AP2 sample-types
  layout, so adapter brittleness surfaces here first.

Test posture: 100 tests (up from 90), 98.35% coverage. ruff + mypy
strict clean. `python -m build` produces sdist + wheel cleanly.

Public API additions:
  - AP2GuardCommitUncertain (exported)

Wire-shape change (acceptable because pre-PyPI):
  - idempotency keys now include a scope namespace: ap2:tx:... or
    ap2:open_mandate:... (was ap2:...)

AUDIT.md and CHANGELOG.md updated.
…rve test

Per positioning-review round 2 on PR #2.

1) "Collapses onto one reservation" was an overclaim. The code makes the
   idempotency key collide; the payload may still differ (distinct
   transaction_id / checkout_hash on each AP2 checkout from the same
   open mandate). Cycles semantics: same key + same payload replays;
   same key + different payload → 409 IDEMPOTENCY_MISMATCH (surfaced as
   AP2GuardDenied). Both block a second valid reservation, but the
   mechanism differs. Reworded in README, AUDIT, and two test docstrings.

2) GuardedPayment.__doc__ and the README lifecycle table still said the
   retry/lock key is "transaction_id". Generalized to "consume-once
   key — open_mandate_hash when present, otherwise transaction_id".

3) New regression test:
   test_open_mandate_divergent_payload_reserve_mismatch_raises_denied
   exercises the second half of the consume-once contract — same open
   mandate hash, distinct transaction_id, first reserve allowed, second
   reserve gets 409 IDEMPOTENCY_MISMATCH from the server and the guard
   raises AP2GuardDenied(reason_code="IDEMPOTENCY_MISMATCH"). Documents
   what the spec really gives you for divergent payloads.

4) Reject empty open_mandate_hash / checkout_hash via pydantic
   min_length=1. Previously an empty string passed validation and
   consume_once_input's truthy check silently fell back to the tx scope
   — data corruption disguised as the default. Two new tests cover the
   rejection paths.

Test posture: 103 tests (up from 100), 98.35% coverage. ruff + mypy
strict clean. `python -m build` produces sdist + wheel cleanly. No
public API additions.

AUDIT.md and CHANGELOG.md updated.
@amavashev
Copy link
Copy Markdown
Contributor Author

Follow-up fixes pushed (1797db4)

All three pre-merge requirements addressed plus the minor polish item.

# Finding Fix
1 "Collapses onto one reservation" overclaim — actual semantics are replay (same payload) or IDEMPOTENCY_MISMATCH (divergent payload) Reworded in README, AUDIT, and the two test docstrings to describe the real contract: same idempotency bucket; replay or mismatch blocks the second reservation
2 Stale "same transaction_id on a retry" wording in GuardedPayment.__doc__ and the README lifecycle commit row Both now read "consume-once key (open_mandate_hash when present, otherwise transaction_id)"
3 Missing test for divergent-payload IDEMPOTENCY_MISMATCH on second reserve under shared open-mandate scope New test_open_mandate_divergent_payload_reserve_mismatch_raises_denied — first reserve allowed, second reserve gets 409 → AP2GuardDenied(reason_code="IDEMPOTENCY_MISMATCH"), body never runs, no second reservation
polish Empty open_mandate_hash / checkout_hash silently fell back to tx scope (truthy check) Pydantic min_length=1 on both fields; rejected at model construction. Two new tests

Gates re-verified locally (all green):

  • ruff check . && ruff format --check . — clean
  • mypy --strict runcycles_ap2 — 0 errors (8 files)
  • pytest --cov=runcycles_ap2 --cov-fail-under=95103 passed, 98.35% coverage (up from 100)
  • python -m build — sdist + wheel build cleanly

No public API additions in this round. AUDIT.md + CHANGELOG.md updated.

P0 — transport errors and 5xx on commit are uncertain, not failed
=================================================================
Previously a transport-level failure or 5xx response on commit fell
through to the "unrecognized rejection" branch: _handle_release was
called and AP2GuardCommitFailed raised with "reservation released" in
the message regardless of outcome. Wrong: the commit POST may have
reached Cycles and mutated state before the failure, so auto-release
risks undoing a successful settle.

New branch ordering in _handle_commit:
  1. success
  2. transport error / 5xx       → AP2GuardCommitUncertain, NO release
  3. RESERVATION_FINALIZED /     → AP2GuardCommitUncertain, NO release
     RESERVATION_EXPIRED /          (existing behavior)
     IDEMPOTENCY_MISMATCH
  4. 4xx with unrecognized code  → release + AP2GuardCommitFailed
                                    (the server explicitly rejected,
                                    release is safe)

Synthetic error_code values for the new branches:
  - TRANSPORT_ERROR — CyclesResponse.transport_error / status=-1
  - SERVER_ERROR   — 5xx without a parseable error code in body
  - specific code propagates when present (e.g. INTERNAL_ERROR)

P1 — bare exception during commit POST now wraps as uncertain
=============================================================
The `try / except Exception / raise` re-raised the raw exception,
losing the reservation_id and bypassing the reconciliation contract.
Wrapped as:
  raise AP2GuardCommitUncertain(
      error_code="COMMIT_RAISED",
      reservation_id=...,
  ) from exc
The original exception is preserved via __cause__.

Four new regression tests cover:
  - 5xx with body          → uncertain, specific code, no release
  - 5xx without body       → uncertain, synthetic SERVER_ERROR, no release
  - transport error        → uncertain, TRANSPORT_ERROR, no release
  - commit raises          → uncertain wrapping original, COMMIT_RAISED,
                              chained __cause__, no release

Existing test_commit_failed_records_release_non_success still passes —
it's about the release POST returning 5xx (not the commit POST), which
remains a release-and-raise flow.

AP2GuardCommitUncertain.__doc__ updated to enumerate all the new
conditions and the error_code values callers can branch on.
README lifecycle + exception tables updated.

Test posture: 107 tests (up from 103), 99.18% coverage. ruff + mypy
strict clean. `python -m build` produces sdist + wheel cleanly. No
public API additions — only an expanded contract on the existing
AP2GuardCommitUncertain.

AUDIT.md and CHANGELOG.md updated.
@amavashev
Copy link
Copy Markdown
Contributor Author

Round-3 fixes pushed (4642a8d)

Both findings addressed.

# Finding Fix
P0 Transport errors and 5xx on commit fell into the "unrecognized rejection" → release branch and were reported as "reservation released" — wrong, since the POST may have mutated Cycles before the failure New branch ordering: success → transport/5xx → terminal codes → 4xx-unrecognized. First three all raise AP2GuardCommitUncertain with NO release. Synthetic error_code values: TRANSPORT_ERROR, SERVER_ERROR, plus specific codes when present
P1 Bare except Exception re-raised the raw exception, losing reservation_id and bypassing the reconciliation contract Wrapped as AP2GuardCommitUncertain(error_code="COMMIT_RAISED", reservation_id=...) from exc — original preserved via __cause__

Four new regression tests:

  • test_commit_5xx_raises_uncertain_no_release — 5xx with body, specific code propagated
  • test_commit_5xx_without_body_uses_synthetic_code — 502 with body=NoneSERVER_ERROR
  • test_commit_transport_error_raises_uncertain_no_releaseCyclesResponse.transport_error() path
  • test_commit_raises_exception_surfaces_as_uncertainConnectionError raised by client → wrapped + chained

AP2GuardCommitUncertain.__doc__ now enumerates all five conditions and the error_code values callers can branch on. README lifecycle + exception tables updated.

Gates re-verified locally (all green):

  • ruff check . && ruff format --check . — clean
  • mypy --strict runcycles_ap2 — 0 errors (8 files)
  • pytest --cov=runcycles_ap2 --cov-fail-under=95107 passed, 99.18% coverage (up from 103)
  • python -m build — sdist + wheel build cleanly

No public API additions — only an expanded contract on the existing AP2GuardCommitUncertain. AUDIT.md + CHANGELOG.md updated.

Three README spots still implied the consume-once boundary is always
transaction_id and that distinct attempts "collapse onto one
reservation". Both are wrong with the open-mandate scope and the
real Cycles idempotency semantics (replay or IDEMPOTENCY_MISMATCH).

1. "How the guard responds" table — the HTTP/transport-error-on-reserve
   row said "same transaction_id ⇒ same reserve key". Generalized to
   "same consume-once scope (open_mandate_hash when present, otherwise
   transaction_id) ⇒ same reserve key".

2. AP2 mapping table — the transaction_id row said "also feeds
   idempotency keys" unconditionally. Now: "feeds the idempotency key
   only when open_mandate_hash is absent (otherwise the open mandate
   is the consume-once scope)".

3. Features bullet — "duplicate workers on the same mandate collapse
   onto one reservation server-side" replaced with the accurate
   "share one idempotency bucket server-side; identical replays
   return the original reservation, divergent attempts are rejected
   with IDEMPOTENCY_MISMATCH".

Code unchanged. Gates still green: ruff + mypy strict + 107 tests.
@amavashev
Copy link
Copy Markdown
Contributor Author

README cleanup pushed (88f2524)

All three stale lines fixed:

Location Before After
Lifecycle table, transport-error-on-reserve row "same transaction_id ⇒ same reserve key" "same consume-once scope (open_mandate_hash when present, otherwise transaction_id) ⇒ same reserve key"
AP2 mapping table, PaymentMandate.transaction_id row "also feeds idempotency keys" "feeds the idempotency key only when open_mandate_hash is absent (otherwise the open mandate is the consume-once scope)"
Features bullet "duplicate workers on the same mandate collapse onto one reservation server-side" "duplicate workers on the same mandate share one idempotency bucket server-side; identical replays return the original reservation, divergent attempts are rejected with IDEMPOTENCY_MISMATCH"

Code unchanged. Gates re-verified locally (ruff + ruff format clean, mypy strict 0 errors, 107 tests pass).

…stale docstrings

P2 — Idempotency-key suffix accepted non-ASCII letters
======================================================
safe_suffix used `c.isalnum()`, but Python's str.isalnum is
Unicode-aware: "É".isalnum() is True. A non-ASCII suffix (e.g. a
localized exception class name like "Échec") would have reached the
Idempotency-Key HTTP header. RFC 7230 requires ASCII tokens, so httpx
would reject the request.

Tightened the predicate to `(c.isascii() and c.isalnum()) or c in
("_", "-", ".")`. Regression test asserts the key is isascii() even
for a non-ASCII suffix.

P3 — from_ap2() masked empty checkout_hash as None
==================================================
The `or` short-circuit at
  getattr(cm, "hash", None) or getattr(cm, "checkout_hash", None)
silently dropped upstream `hash=""` to None, bypassing the model's
new `min_length=1` rejection. Replaced with explicit None-check on
the first attribute; fall back only when the first is genuinely
missing. Empty strings now propagate to the model and get rejected
at construction.

Two regression tests:
  - hash="" → ValidationError at AP2Mandate(...)
  - alternate naming (checkout_hash attr only, no hash attr) still
    falls through correctly

Kept payee.website / .identifier fallback unchanged — that one is
intentional alternates-not-aliases behavior with a dedicated test.

P3 — Stale docstrings refreshed
===============================
1. mapping.build_commit_body.__doc__: said "key derived from
   transaction_id" — generalized to the consume-once scope
   (open_mandate_hash when present, otherwise transaction_id),
   matching the actual implementation.

2. AP2GuardCommitFailed.__doc__: still described RESERVATION_FINALIZED
   / RESERVATION_EXPIRED / IDEMPOTENCY_MISMATCH as "benign replay"
   that this exception is NOT raised for. Misleading — those raise
   AP2GuardCommitUncertain (a different exception), not a quiet no-op.
   Rewrote to scope this exception to 4xx-explicit-rejection only and
   point readers at AP2GuardCommitUncertain for the unknown-outcome
   conditions.

Test posture: 110 tests (up from 107), 99.19% coverage. ruff + mypy
strict clean. `python -m build` produces sdist + wheel cleanly.

No public API additions — only behavior tightening on existing paths
and documentation sync.

AUDIT.md and CHANGELOG.md updated.
@amavashev
Copy link
Copy Markdown
Contributor Author

Round-4 fixes pushed (00835b5)

All three findings addressed.

# Finding Fix
P2 c.isalnum() in the suffix sanitizer is Unicode-aware ("É".isalnum() == True), so non-ASCII chars could reach the Idempotency-Key header — RFC 7230 requires ASCII tokens, httpx would reject the request Tightened to (c.isascii() and c.isalnum()) or c in ("_", "-", "."). New test_non_ascii_suffix_is_sanitized_to_ascii asserts the key is isascii() even for a non-ASCII suffix like "Échec"
P3 from_ap2 used getattr(..., "hash", None) or getattr(..., "checkout_hash", None)hash="" short-circuited to None, bypassing min_length=1 model validation Explicit None-check on the first attribute; fall back only when the first is genuinely missing. Empty strings now propagate and the model rejects. Two new tests: empty hash=""ValidationError; alternate naming still falls through correctly
P3 build_commit_body.__doc__ said "derived from transaction_id"; AP2GuardCommitFailed.__doc__ claimed terminal codes are "benign replay" not raised by this exception (misleading — they raise AP2GuardCommitUncertain instead) Both rewritten to match current behavior. AP2GuardCommitFailed is now scoped to 4xx-explicit-rejection only and points readers at AP2GuardCommitUncertain for unknown-outcome conditions

Gates re-verified locally (all green):

  • ruff check . && ruff format --check . — clean
  • mypy --strict runcycles_ap2 — 0 errors (8 files)
  • pytest --cov=runcycles_ap2 --cov-fail-under=95110 passed, 99.19% coverage (up from 107)
  • python -m build — sdist + wheel build cleanly

No public API additions; only behavior tightening on existing paths and documentation sync. AUDIT.md + CHANGELOG.md updated.

…ternals

Earlier passes fixed README and feature-bullet wording but missed two
internal docstrings/comments that still used the inaccurate phrasing:

  - runcycles_ap2/_constants.py — the IDEMPOTENCY_SCOPE_OPEN_MANDATE
    comment block said all checkouts "collapse onto one reservation"
  - runcycles_ap2/mapping.py — consume_once_input.__doc__ said the
    server "collapses them onto one reservation"

Replaced both with the accurate "share one (tenant, endpoint,
idempotency_key) bucket; identical payloads replay, divergent
payloads hit IDEMPOTENCY_MISMATCH; either way no second valid
reservation" framing.

Also:
  - tests/test_mapping.py — TestIdempotencyKeyOpenMandateScope class
    docstring still said "must collapse onto one reservation"; reworded
  - AUDIT.md — stale test-name reference
    (test_open_mandate_overuse_collapses_distinct_transactions_onto_one_key
    was renamed in a follow-up to ...shares_idempotency_bucket...);
    updated and annotated.

The two remaining "collapse" matches in AUDIT.md (lines 50 and 71) are
intentional — they quote the old phrasing inside historical entries that
document the wording fix itself, not current behavior claims.

No behavior change. Gates green: ruff + mypy strict + 110 tests pass.
@amavashev
Copy link
Copy Markdown
Contributor Author

Round-5 fix pushed (7f0d8db)

Doc-only cleanup — the implementation already matched the accurate wording; only internal docstrings and one comment still carried the old "collapse onto one reservation" overclaim.

File Spot Fix
runcycles_ap2/_constants.py IDEMPOTENCY_SCOPE_OPEN_MANDATE comment block Replaced "all checkouts collapse onto one reservation" with the accurate "land in one (tenant, endpoint, idempotency_key) bucket; identical payloads replay, divergent payloads hit IDEMPOTENCY_MISMATCH; either way no second valid reservation"
runcycles_ap2/mapping.py consume_once_input.__doc__ Same rewording
tests/test_mapping.py TestIdempotencyKeyOpenMandateScope class docstring Generalized to "lands in one bucket" framing
AUDIT.md Stale test-name reference Updated to test_open_mandate_overuse_shares_idempotency_bucket_across_transactions (was renamed in a follow-up pass) and annotated

Two remaining "collapse" matches in AUDIT.md (lines 50 and 71) are intentional — they quote the old phrasing inside historical entries that document the wording fix itself, not current behavior.

Gates re-verified locally (all green):

  • ruff check . && ruff format --check . — clean
  • mypy --strict runcycles_ap2 — 0 errors (8 files)
  • pytest — 110 passed
  • (build skipped — no code changes)

No behavior change, no new tests, no public API change.

@amavashev amavashev merged commit 4810680 into main May 13, 2026
3 checks passed
@amavashev amavashev deleted the fix/open-mandate-consume-once-and-commit-uncertain branch May 13, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant