feat(email-recovery): recovery flow — prepare_delegation + delegation stamping + get_delegation (PR 7 of email-recovery stack)#3881
Open
sea-snake wants to merge 6 commits into
Conversation
This was referenced May 12, 2026
Merged
f04ea77 to
f6d88aa
Compare
52ed1eb to
c85297a
Compare
f6d88aa to
128fdf8
Compare
c85297a to
16fc9bc
Compare
128fdf8 to
9f2b62d
Compare
16fc9bc to
6cbfe79
Compare
sea-snake-translation-bot
pushed a commit
to sea-snake-translation-bot/internet-identity
that referenced
this pull request
May 12, 2026
…ck) (dfinity#3877) ## Summary PR 2 of the email-recovery stack (`docs/ongoing/email-recovery.md` §10 Phase 0). Stacks on top of PR 3838 (DNSSEC verifier). Lands a hand-rolled RFC 6376 DKIM verifier that consumes a parsed `SmtpRequest` plus an already-trusted DKIM TXT record and returns a per-step `EmailVerificationStatus`. **Note:** This PR targets `main` but includes PR 3838's commits (DNSSEC verifier) as its base. Review the DKIM-specific changes by looking at commits after `9bbd8717` (the last PR 3838 commit). Once PR 3838 merges, this PR's diff will shrink to just the DKIM additions. ## Why hand-rolled The design originally specified `mail-auth` (Stalwart's well-tested DKIM library), but mail-auth pulls a non-optional `hickory-resolver` dep that fails to compile for `wasm32-unknown-unknown` (transitive: tokio + mio). Forking + patching mail-auth would be possible but creates perpetual rebase burden. We hand-roll instead — "the right way, no shortcuts" was the explicit guidance. ## What's in this PR ### `src/internet_identity_interface/src/internet_identity/types/smtp.rs` Brings forward the SMTP gateway protocol types from PoC PR 3760: `SmtpRequest`/`SmtpResponse`/`SmtpHeader`/`SmtpMessage`/`SmtpAddress`/`SmtpEnvelope`, the size bounds, and the input-bound validation (`format_address` lowercases both halves; `truncate_at_char_boundary` clamps to the previous UTF-8 boundary so a multi-byte subject can't trap the canister). Drops postbox-specific bits (PostboxEmail, ValidatedSmtpRequest, anchor-number parser). ### `src/internet_identity/src/dkim/` - **`types.rs`** — Algorithm (RsaSha256, Ed25519Sha256), HeaderCanon/BodyCanon (Relaxed, Simple), DkimCheck/DkimCheckName/DkimCheckStatus per-step diagnostics, EmailVerificationStatus / VerificationFailReason result shape. - **`parse.rs`** (RFC 6376 §3.5) — DKIM-Signature header tag-list parser. Splits structurally on `;` first then on the *first* `=` per element, so a literal `b=` substring inside another tag's base64 doesn't get misread as a new tag start (the bug class the PoC PR review specifically flagged). Folded whitespace inside base64 values is stripped before decoding. Tag names case-insensitive; duplicates rejected. - **`canonicalize.rs`** (§3.4.2 / §3.4.4) — relaxed header canon (lowercase name, unfold continuations, collapse WSP+ to single SP, strip trailing WSP, strip WSP around colon) and relaxed body canon (per-line WSP cleanup, drop trailing empty lines, ensure non-empty output ends in exactly one CRLF). - **`dns_record.rs`** (§3.6.2) — DKIM TXT record parser. Tag names case-insensitive (`P=` vs `p=` was a PoC bug), whitespace inside `p=` tolerated (multi-chunk DNS TXT records), `t=y`/`t=s` flags honoured, unknown tags ignored. - **`signature.rs`** — RSA-SHA256 (RFC 5702 / RFC 8301) and Ed25519-SHA256 (RFC 8463) signature verification on top of `rsa`+`sha2`+`ed25519-dalek` from PR 1's deps. Enforces 1024-bit RSA minimum per design §5.6. Ed25519 path wraps in SHA-256 per RFC 8463. Plus `body_hash_sha256` with optional `l=` truncation per §3.4.5. - **`verify.rs`** — orchestration. Multi-signature loop per §5.5 (accept on first pass), tag enforcement per design §5.4 (c=relaxed/* only, x= expiration, i= alignment with d=, k= match, t=y testing-mode), bottom-up header selection per §5.4 when h= lists a name multiple times, b=value blanking that's structural-position-aware so it doesn't mis-target an internal substring. - **`test_vectors.rs`** — `#[cfg(test)]` .eml loader + 8 end-to-end tests against committed fixtures. ### `test_vectors/dkim/` - 3 synthetic .eml files generated offline with dkimpy + a 2048-bit RSA key (`relaxed/relaxed`, `relaxed/simple`, `simple/simple`). - The matching DKIM TXT record (public key only). - README documenting provenance — the throwaway private key is **not** committed. ## Test plan - [x] `cargo check -p internet_identity --target wasm32-unknown-unknown` — clean. - [x] `cargo test -p internet_identity --bin internet_identity dkim` — 75 tests pass (parse 14, canonicalize 18, dns_record 16, signature 7, verify 12, end-to-end 8). - [x] `cargo test -p internet_identity --bin internet_identity` — 313 tests pass total (was 238 before this PR; +75 DKIM, plus a few in smtp types). - [x] `cargo test -p internet_identity_interface --lib` — 52 tests pass (was 42; +10 SMTP type tests). - [x] `cargo clippy -p internet_identity --bin internet_identity --tests -- -D warnings` — clean. - [x] `cargo fmt --check` — clean (modulo pre-existing diffs unrelated to this PR). ## Stack This is PR 2 of a 12-PR series. Includes PR 3838's commits as its base; once PR 3838 merges, the diff shrinks to just the DKIM additions. Subsequent PRs: - **PR 3** — DMARC alignment. - **PR 4** — DoH outcall fallback for unsigned domains (Gmail / Outlook / iCloud — see the design doc §7.6 and the team Slack writeup). - **PRs 5–9** — storage + Candid + behavior for email recovery. - **PRs 10–12** — frontend. ## PR Stack | # | PR | Description | Status | |---|---|---|---| | 0 | [dfinity#3836](dfinity#3836) | Design doc | Open | | 1 | [dfinity#3838](dfinity#3838) | DNSSEC verifier scaffold | Open | | 2 | [dfinity#3877](dfinity#3877) | DKIM verifier (RFC 6376) | Open | | 3 | [dfinity#3878](dfinity#3878) | DMARC alignment (RFC 7489) | Open | | 4 | [dfinity#3879](dfinity#3879) | DoH fallback | Open | | 5+6 | [dfinity#3880](dfinity#3880) | Setup flow (storage + smtp_request) | Open | | 7 | [dfinity#3881](dfinity#3881) | Recovery flow (delegation) | Open | | 8 | [dfinity#3882](dfinity#3882) | Frontend + feature flag | Open | | 9 | [dfinity#3883](dfinity#3883) | Deploy/upgrade scripts: dnssec_config + doh_config | Open | | 10 | [dfinity#3884](dfinity#3884) | Email-recovery UX overhaul | Open | --------- Co-authored-by: Arshavir Ter-Gabrielyan <arshavir.ter.gabrielyan@dfinity.org> Co-authored-by: Claude <noreply@anthropic.com>
Brings the recovery half (`prepare_delegation` → `smtp_request` →
`submit_dkim_leaf` (DNSSEC) or finished synchronously (DoH) →
`get_delegation`) onto the new two-phase storage-and-smtp base. The
recovery flow shares all the heavy lifting with the setup flow —
prepare validation, skeleton chain caching, DKIM-signature parsing,
body-hash check, partial-verification stash, leaf admission, DKIM
crypto verify, DMARC alignment — and only diverges at finalization.
Setup and recovery now share:
- One `prepare_common` validation core. Setup parks
`PendingKind::Register{anchor}`; recovery parks
`PendingKind::Recover{session_pk}` after capping `session_pk` to
`MAX_SESSION_KEY_BYTES = 1024`.
- One `smtp_request` dispatcher. Recipient (`register@id.ai` vs
`recover@id.ai`) is cross-checked against the entry's
`PendingKind` so a forged `to:` can't run the wrong flow.
- One DNSSEC partial-verification path: parse → bh= → digest →
cache → flip `NeedDkimLeaf{selector}`. Recovery is no different
from setup at this step.
- One `submit_dkim_leaf` path: leaf admission against the cached
zone DNSKEY, DKIM crypto verify (prehash), DMARC alignment.
What diverges at finalization:
- **Setup**: `bind_credential(anchor, address)` writes the
`EmailRecoveryCredential` to the named anchor.
- **Recovery**: `stamp_recovery_delegation` looks up the anchor
from the verified `From:` via the reverse-address index
(memory ID 24, hashed-key map), derives the seed
`H(salt || "email-recovery" || lowercase(address) || anchor)`,
adds the canister signature for `(session_pk, expiration)`, and
caches a `RecoveryOutcome { user_key, expiration, anchor_number,
seed }` on the pending entry. Polling then surfaces
`RecoveryReady{user_key, expiration, anchor_number}`.
This finalization fork lives in two places:
- `smtp.rs` for the DoH path (verification finishes synchronously
inside `smtp_request`).
- `submit_leaf.rs` for the DNSSEC path (verification finishes
inside `email_recovery_submit_dkim_leaf` after the FE submits
the leaf).
Both call `stamp_recovery_delegation` on the recovery branch and
`bind_credential` on the setup branch.
Other scoped pieces:
- `EmailRecoveryStatus::RecoveryReady` carries `anchor_number` so
the FE seeds its auth store without a separate lookup.
- `PendingChallenge.recovery_outcome: Option<RecoveryOutcome>`
caches the seed + anchor + user_key for `get_delegation`.
- `email_recovery_get_delegation(args)` query mirrors
`openid_get_delegation` in shape — uses the cached seed to
retrieve the canister signature.
- `recovery_seed_for_nonce(nonce)` exposes the cached seed to
`get_delegation` without re-deriving from the anchor.
- Reverse address index: `SHA-256(lowercase(address)) →
AnchorNumber`, memory ID 24, kept in sync with anchor writes.
- `IdentityInfo.email_recovery: Option<EmailRecoveryCredential>`
surfaced on `identity_info` so the manage page renders the
recovery-email card without a second canister call.
- `check_authorization` recognises an additional principal kind:
delegations rooted in `H(salt || "email-recovery" ||
lowercase(address) || anchor)` are accepted as authenticating
the matching anchor. After a recovery completes the FE's
session keypair holds such a delegation; this lets the user
call `identity_info` and the rest of the authenticated surface
immediately, without re-mint.
- Archive operations: payload-free `AddEmailRecovery` /
`RemoveEmailRecovery` variants on `Operation` so audit
consumers can answer "who changed their recovery email when?"
without leaking the address (§8.2).
- Activity-stats counter for email-recovery delegation issuance,
alongside the existing per-issuer OpenID counter.
- `archive.did`, `internet_identity.did` updated; FE bindings
regenerated.
444 unit tests pass (3 new). The integration-test happy-path
fixture still needs a follow-up rewrite to exercise the new
two-phase shape end-to-end (prepare → smtp_request[NeedDkimLeaf]
→ submit_dkim_leaf), but compiles clean against the new types.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two follow-ups from PR #3855 review on the reverse address index: - `Storage::update_email_recovery_lookup` now returns `Result<(), AnchorNumber>` and rejects when the new address is already bound to a *different* anchor (Err carries the existing anchor). Same-anchor rebinds remain idempotent so a user retrying the wizard against their own anchor still works. - A new `StorageError::EmailRecoveryAddressAlreadyBound { existing_anchor }` variant carries that conflict back through `Storage::write`. `email_recovery::smtp::bind_credential` matches it and surfaces the user-facing `EmailRecoveryError::AddressAlreadyRegistered` instead of the InternalCanisterError catch-all the previous `format!("write anchor: {e:?}")` produced. - `StorableEmailRecoveryAddressHash::from_bytes` no longer panics on unexpected input. Switched to the same `slice_to_bounded_32` helper `StorableApplication`'s hash uses, so corrupted stable memory zero-pads / truncates to 32 instead of trapping mid-call. 444 unit tests pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Real-world DKIM resolution often crosses zone boundaries via CNAME —
proton.me → proton.ch, tutanota.com → tutanota.de, M365 custom
domains, etc. DNSSEC signatures don't span zones, so the verifier
needs the DNSKEY chain for every zone touched and must authenticate
each hop independently against its own zone.
Bundle shape (interface + internal):
DnsProofBundle {
root_dnskey: SignedRRset,
chains: Vec<DelegationChain>, // one per signing zone
hops: Vec<SignedRRset>, // CNAME, …, final TXT
}
Verification (verify.rs):
- verify_root_dnskey_with_clock — root vs. trust anchors + freshness
- verify_chain_with_clock(chain, root_dnskey) — walk one chain
- verify_extra_chains_with_clock — populate (zone → DNSKEY) map
- verify_hops_with_clock — per-hop signature under signer_name's zone,
CNAME chain coherence (first owner == requested_name, intermediates
are CNAMEs whose target equals next owner, final type matches,
no loops, ≤ MAX_CNAME_HOPS = 4)
- verify_bundle_with_clock — top-level convenience
Cached pending-challenge state (pending.rs):
- cached_root_dnskey + cached_zones (ZoneKeysMap) replace the old
single cached_zone_dnskey. The map starts with one zone (apex)
for Gmail-style and grows at submit_dkim_leaf time when the DKIM
CNAME chain crosses into a new signed zone.
submit_dkim_leaf API (interface + .did):
EmailRecoverySubmitDkimLeafArg {
nonce, hops, extra_chains
}
The canister re-validates the cached root DNSKEY, walks any
extra_chains under it, validates each hop against the resulting
zone-keys map, then resolves the hop sequence to the final TXT for
DKIM verification.
Live.com case: apex signed but DKIM CNAMEs into unsigned territory.
The FE walker abandons on the first missing-RRSIG hop and falls
through to the DoH path — see scripts/default-doh-domains.bash on
the deploy-scripts branch.
Tests: 18 unit tests pass, including 5 new ones covering CNAME chain
coherence, duplicate-zone rejection, hop cap, owner mismatch, and
type mismatch.
The mod.rs preamble said the recovery half lives in a follow-up PR (`feat/email-recovery-flow`, #3843). On the cumulative diff that copilot reviews, that PR is part of the same stack and the recovery half is right here in this module — so the docstring read as out of date. Rewrite to describe both halves as living together. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The off-chain SMTP gateway routes mail at a domain that varies per deploy: id.ai on prod, beta.id.ai on beta. The canister was hardcoding `id.ai` everywhere — recipient dispatch, the validate query, and the user-facing label returned from prepare — so on the beta canister mail to `register@beta.id.ai` reached the canister but failed the recipient match and was silently dropped. Drop the hardcoded constant. Derive the accepted mailbox domains from `related_origins`, which is already a per-deploy arg the deploy scripts wire through (and the same one used for security headers + the FE's `getPrimaryOrigin`). All entries are treated as equal aliases — recipient dispatch and the `smtp_request_validate` query accept `register@<host>` / `recover@<host>` for any host listed in `related_origins`. So a prod deploy with `id.ai` + the `*.icp0.io` aliases accepts mail at all of them; a beta deploy with `beta.id.ai` accepts that one. Drop the `mailbox` field from `EmailRecoveryChallenge` too. The FE already knows which origin the user is on (`window.location.hostname`), so it pairs that with `register` / `recover` to render the label — each tab automatically shows the alias matching the origin the user is on, and the canister never has to single one out as canonical. Empty / unset `related_origins` → no domains accepted; the canister drops every inbound recipient. Real deploys always configure this. Tests: extended `email_recovery::smtp::tests` with `set_related_origins` helper + 15 cases (single host, multi-alias prod, beta-only, unknown user, wrong domain, no-origins-configured); all pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After the dnssec/* commits got absorbed into PR 1 (rebase took ours for those conflicts), the only remaining PR 7 fixups are at the canister-method boundary: - internet_identity.did: EmailRecoverySubmitDkimLeafArg gained `hops: vec SignedRRset` + `extra_chains: vec DelegationChain` replacing the single `dkim_leaf: SignedRRset` field. - submit_leaf.rs: use the multi-zone variant from PR 7 (was lost during rebase when 'take ours' replaced it with the storage-and-smtp single-leaf version). - integration tests: update the EmailRecoverySubmitDkimLeafArg literal to the new shape. 463 unit tests pass; clippy clean.
9f2b62d to
3884a47
Compare
6cbfe79 to
bcb5975
Compare
aterga
added a commit
that referenced
this pull request
May 13, 2026
…of email-recovery stack) (#3878) ## Summary PR 3 of the email-recovery stack (`docs/ongoing/email-recovery.md` §6). Stacks on top of #3877 (DKIM verifier). Lands a hand-rolled DMARC alignment check and reshapes the verifier API: `dkim::verify_dkim` becomes a DKIM-only primitive, and the new `dmarc::verify_email` is the public top-level entry point that produces the combined `EmailVerificationStatus`. **Note:** This PR targets `main` but includes PRs 1+2's commits as its base. Review the DMARC-specific changes by looking at commits on top of `ec371aae3` (PR 2's tip). Once PRs 1+2 merge, this PR's diff shrinks to just the DMARC additions. ## What's in this PR ### `src/internet_identity/src/dmarc/` - **`types.rs`** — `DmarcOutcome` (Aligned / Misaligned / NoRecord / Malformed), `DmarcPolicy` (None / Quarantine / Reject), `AlignmentMode` (Strict / Relaxed), `DmarcRecord`, plus the combined `EmailVerificationStatus` that carries both DKIM diagnostics and the DMARC outcome on success. - **`parse.rs`** (RFC 7489 §6.3) — DMARC TXT record parser. Enforces `v=DMARC1` must be first, `p=` must be one of {none, quarantine, reject}, `pct=` 0..=100, rejects duplicate tags, ignores unknown / reporting tags. 12 unit tests. - **`from_header.rs`** (RFC 5322 / RFC 7489 §3.1.1) — single-mailbox From-header parser. Accepts bare addr-spec, name-addr, and quoted-display-name forms; rejects zero/multiple From: headers, address-lists, group syntax. Tolerates comma/colon inside quoted display names. 16 unit tests. - **`alignment.rs`** — strict (exact match) + relaxed (exact match OR label-aligned subdomain in either direction). Stricter than RFC-compliant relaxed alignment because we deliberately don't consult the PSL — design doc §6.4 documents the trust + asymmetric-failure-mode reasoning. The dot anchor on the subdomain check prevents `evilexample.com` from aliasing `example.com`. 8 unit tests. - **`verify.rs`** — orchestration. DKIM first; on failure, surface the DKIM reason verbatim. On DKIM pass, parse From and check DMARC alignment. Accepted iff Aligned, OR NoRecord with `dkim_domain == from_domain`. 8 unit tests. - **`test_vectors.rs`** — 5 end-to-end tests reusing PR 2's synthetic .eml fixtures. ### `src/internet_identity/src/dkim/types.rs` (rename + new variants) - Renamed `EmailVerificationStatus` → `DkimVerifyResult` (DKIM-only). The combined verdict moved to `dmarc::EmailVerificationStatus` so it can carry the `DmarcOutcome`. - Added `MalformedFromHeader(String)`, `DmarcMalformed(String)`, `DmarcMisaligned` to `VerificationFailReason`. ### `src/internet_identity/src/dkim/mod.rs` - Re-exports `verify` as `verify_dkim` so downstream callers (the dmarc layer) don't have to deal with both a `dkim::verify` and `dmarc::verify` in scope at the same time. ## Test plan - [x] `cargo check -p internet_identity --target wasm32-unknown-unknown` — clean. - [x] `cargo test -p internet_identity --bin internet_identity dmarc` — 49 tests pass (12 parse + 16 from_header + 8 alignment + 8 verify + 5 e2e). - [x] `cargo test -p internet_identity --bin internet_identity` — 365 tests pass total (was 313 with PR 2; +49 dmarc + 3 small reshape adjustments). - [x] `cargo clippy -p internet_identity --bin internet_identity --tests -- -D warnings` — clean. - [x] `cargo fmt --check` — clean (modulo pre-existing unrelated diffs). ## PR Stack | # | PR | Description | Status | |---|---|---|---| | 0 | [#3836](#3836) | Design doc | Open | | 1 | [#3838](#3838) | DNSSEC verifier scaffold | Open | | 2 | [#3877](#3877) | DKIM verifier (RFC 6376) | Open | | 3 | [#3878](#3878) | DMARC alignment (RFC 7489) | Open | | 4 | [#3879](#3879) | DoH fallback | Open | | 5+6 | [#3880](#3880) | Setup flow (storage + smtp_request) | Open | | 7 | [#3881](#3881) | Recovery flow (delegation) | Open | | 8 | [#3882](#3882) | Frontend + feature flag | Open | | 9 | [#3883](#3883) | Deploy/upgrade scripts: dnssec_config + doh_config | Open | | 10 | [#3884](#3884) | Email-recovery UX overhaul | Open | --------- Co-authored-by: Arshavir Ter-Gabrielyan <arshavir.ter.gabrielyan@dfinity.org> Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Recovery flow on top of the two-phase DNSSEC architecture. Stacked on #3880 (setup flow).
email_recovery_prepare_delegation(dns_input, session_pk)— anonymous. Same as setup-prepare plus a FE-generated session public key that the eventual delegation will be bound to.email_recovery_get_delegation(nonce, session_key, expiration)— query. AfterRecoveryReady, the FE fetches theSignedDelegation.address → AnchorNumberstable index (memory ID 24) for resolving the verifiedFrom:to an anchor at recovery time.RecoveryReadystatus variant carriesanchor_numberso the FE seeds its auth store directly.check_authorizationvia a newAuthorizationKey::EmailRecoveryAddressvariant.PR Stack
Changes during review
RecoveryReadygained ananchor_numberfield. The recovery flow already knows the anchor at smtp time (address → anchor reverse index) — surfacing it on the status payload lets the FE seed its auth store directly instead of running a recovery-phrase-keyed lookup against an email-recovery delegation. (See PR feat(email-recovery): frontend wizard + EMAIL_RECOVERY flag (beta.id.ai default-on) #3882's discussion.)bind_credential_to_anchornow refuses cross-anchor rebinds (EmailRecoveryError::AddressAlreadyRegistered); same-anchor rebinds remain idempotent.check_authorization:AuthorizationKeygains anEmailRecoveryAddress(String)variant and the authz check derives the canister-sig principal fromH(salt || "email-recovery" || lowercase(address) || anchor)for each bound credential.activity_bookkeepingupdateslast_usedon the matched credential, and the daily/monthly stats counter gets anemail_recovery_counter.🤖 Generated with Claude Code