feat(dmarc): RFC 7489 alignment + combined DKIM+DMARC verifier (PR 3 of email-recovery stack)#3840
feat(dmarc): RFC 7489 alignment + combined DKIM+DMARC verifier (PR 3 of email-recovery stack)#3840sea-snake wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This stacked PR advances the email-recovery pipeline by adding the low-level DNSSEC, DKIM, and DMARC verification pieces that later recovery endpoints will rely on. In this diff, the public verification surface shifts toward a combined DKIM+DMARC verdict while also introducing the DNSSEC trust-anchor/config plumbing and supporting test fixtures.
Changes:
- Adds a hand-rolled DNSSEC verifier, DKIM verifier, and DMARC parser/alignment/verifier flow for email recovery.
- Extends canister/interface config with DNSSEC trust anchors and adds SMTP gateway wire types.
- Adds captured DNSSEC fixtures, synthetic DKIM fixtures, end-to-end/unit tests, and a DNSSEC capture helper script.
Reviewed changes
Copilot reviewed 38 out of 39 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
test_vectors/dnssec/iana-root-anchors-2026-05.json |
Root trust-anchor fixture metadata/config sample. |
test_vectors/dnssec/cloudflare-com-2026-05.json |
Captured DNSSEC proof bundle fixture for verifier tests. |
test_vectors/dkim/synth-rsa-test1._domainkey.test.example.com.txt |
Synthetic DKIM TXT record fixture. |
test_vectors/dkim/synth-rsa-simple-simple.eml |
Synthetic DKIM message fixture (simple/simple). |
test_vectors/dkim/synth-rsa-relaxed-simple.eml |
Synthetic DKIM message fixture (relaxed/simple). |
test_vectors/dkim/synth-rsa-relaxed-relaxed.eml |
Synthetic DKIM message fixture (relaxed/relaxed). |
test_vectors/dkim/README.md |
Fixture provenance and regeneration notes. |
src/internet_identity/src/storage/storable/storable_persistent_state.rs |
Persists DNSSEC config through stable storage. |
src/internet_identity/src/state.rs |
Adds DNSSEC config to persistent canister state. |
src/internet_identity/src/main.rs |
Wires DNSSEC config into config()/install-arg handling and registers new modules. |
src/internet_identity/src/dnssec/verify.rs |
DNSSEC verification entry point and chain-walk logic. |
src/internet_identity/src/dnssec/types.rs |
DNSSEC proof/result/error data types. |
src/internet_identity/src/dnssec/test_vectors.rs |
DNSSEC fixture loader for tests. |
src/internet_identity/src/dnssec/signature.rs |
DNSSEC signature and DS-digest verification routines. |
src/internet_identity/src/dnssec/mod.rs |
DNSSEC module exports. |
src/internet_identity/src/dnssec/canonical.rs |
DNSSEC canonical wire-format helpers. |
src/internet_identity/src/dmarc/verify.rs |
Combined DKIM+DMARC verification orchestration. |
src/internet_identity/src/dmarc/types.rs |
DMARC result/policy/alignment types. |
src/internet_identity/src/dmarc/test_vectors.rs |
End-to-end DMARC tests using DKIM fixtures. |
src/internet_identity/src/dmarc/parse.rs |
DMARC TXT parser. |
src/internet_identity/src/dmarc/mod.rs |
DMARC module exports. |
src/internet_identity/src/dmarc/from_header.rs |
From: header parser for DMARC alignment. |
src/internet_identity/src/dmarc/alignment.rs |
DMARC strict/relaxed alignment checks. |
src/internet_identity/src/dkim/verify.rs |
DKIM verification orchestration and header/body hashing. |
src/internet_identity/src/dkim/types.rs |
DKIM result/check/failure types. |
src/internet_identity/src/dkim/test_vectors.rs |
DKIM fixture loader and end-to-end tests. |
src/internet_identity/src/dkim/signature.rs |
DKIM RSA/Ed25519 signature verification and body hashing. |
src/internet_identity/src/dkim/parse.rs |
DKIM-Signature parser. |
src/internet_identity/src/dkim/mod.rs |
DKIM module exports. |
src/internet_identity/src/dkim/dns_record.rs |
DKIM TXT record parser. |
src/internet_identity/src/dkim/canonicalize.rs |
DKIM relaxed canonicalization helpers. |
src/internet_identity/internet_identity.did |
Exposes DNSSEC config types in Candid. |
src/internet_identity/Cargo.toml |
Adds verifier crypto dependencies to canister crate. |
src/internet_identity_interface/src/internet_identity/types/smtp.rs |
Adds SMTP gateway wire/validation types. |
src/internet_identity_interface/src/internet_identity/types/dnssec.rs |
Adds interface-level DNSSEC config types. |
src/internet_identity_interface/src/internet_identity/types.rs |
Re-exports SMTP/DNSSEC types and extends init args. |
scripts/capture-dnssec-chain.py |
Helper script to capture DNSSEC proof bundles. |
Cargo.toml |
Adds workspace crypto dependencies for DNSSEC/DKIM. |
Cargo.lock |
Locks newly added/transitive dependencies. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| is_subdomain_of(dkim_domain, from_domain) || is_subdomain_of(from_domain, dkim_domain) | ||
| } |
There was a problem hiding this comment.
This one is intentional and called out in the design doc (§6.4). We deliberately don't ship a Public Suffix List in the canister:
- Size: the PSL is ~250 KB compressed, ~1 MB uncompressed, and grows monotonically.
- Freshness: it changes weekly, so we'd need a continuous upgrade-arg refresh path or a separate fetch+verify mechanism.
- Trust root: the PSL is community-maintained without a strong cryptographic provenance story; pulling it through the deploy arg means trusting whoever curated this deploy's blob.
The mitigation we do enforce is the label-anchored dot check (so evilexample.com cannot align with example.com), and the co.uk-style asymmetric case is fail-closed for the From-domain owner (DMARC misaligned), not fail-open. The threat model is "spoofer claims to be user@co.uk" — co.uk doesn't publish DMARC, so receiver semantics fall back to "no DMARC, accept iff DKIM d= matches From". I think that's the safer default for a verifier targeting major mailbox providers (Gmail, Outlook, etc.) which all publish proper DMARC.
Open to revisiting if we'd rather ship a PSL — happy to discuss in a separate thread. Leaving as-is for now.
6d4b675 to
6f9cf9b
Compare
PR 3 of the email-recovery stack (docs/ongoing/email-recovery.md §6).
Stacks on top of PR 2 (DKIM verifier) and reshapes the verifier API:
dkim::verify_dkim now returns a DKIM-only DkimVerifyResult, and the new
dmarc::verify_email is the public top-level entry point that produces
the combined EmailVerificationStatus.
New module src/internet_identity/src/dmarc/:
- types.rs: DmarcOutcome (Aligned / Misaligned / NoRecord / Malformed),
DmarcPolicy, AlignmentMode, DmarcRecord, plus the combined
EmailVerificationStatus that carries both the DKIM diagnostic and
the DMARC outcome on success.
- parse.rs: DMARC TXT record parser per RFC 7489 §6.3. Enforces
v=DMARC1 must be first, p= must be one of {none, quarantine, reject},
pct= 0..=100, rejects duplicate tags, ignores unknown tags. 12 unit
tests.
- from_header.rs: RFC 5322 single-mailbox From: parser. Accepts
bare addr-spec, name-addr, and quoted-display-name forms; rejects
zero/multiple From: headers, address-lists, group syntax. Tolerates
comma/colon inside quoted display names. 16 unit tests.
- alignment.rs: strict (exact match) + relaxed (exact match OR
label-aligned subdomain in either direction). Stricter than RFC-
compliant relaxed alignment because we deliberately don't consult
the PSL — see design doc §6.4. The dot anchor on the subdomain check
prevents 'evilexample.com' from aliasing 'example.com'. 8 unit tests.
- verify.rs: orchestration. DKIM first; on failure, surface the DKIM
reason verbatim. On DKIM pass, parse From and check DMARC alignment.
Accepted iff Aligned, OR NoRecord with dkim_domain == from_domain.
8 unit tests.
dkim/types.rs:
- Renamed EmailVerificationStatus -> DkimVerifyResult (DKIM-only).
The combined verdict moved to dmarc::EmailVerificationStatus so it
can carry the DmarcOutcome.
- Added MalformedFromHeader, DmarcMalformed, DmarcMisaligned to
VerificationFailReason.
44 DMARC tests pass (12 parse + 16 from_header + 8 alignment + 8 verify),
on top of the existing 78 DKIM tests. Wasm32 build clean.
5 end-to-end tests in dmarc/test_vectors.rs reusing the synthetic .eml fixtures from PR 2 (alice@test.example.com signed with d=test.example.com, exact match → trivially aligned regardless of mode). Covers: - no DMARC record + dkim==from → Verified(NoRecord) - aligned DMARC strict → Verified(Aligned, Strict) - aligned DMARC relaxed (default) → Verified(Aligned, Relaxed) - malformed DMARC TXT → Unverified(DmarcMalformed) - From: with address-list → Unverified 49 dmarc tests total (44 unit + 5 e2e). 365 total in the II suite. The parse_eml helper is duplicated from dkim/test_vectors.rs because the original is #[cfg(test)] and not pub-visible across modules; the duplication is contained to test code.
6f9cf9b to
42ee57d
Compare
|
Replaced with a new PR from an upstream branch (enables direct collaboration). Same content, new PR number. |
DKIM (PR 2's code): - canonicalize.rs: relaxed_body empty body → CRLF per RFC 6376 §3.4.3 - verify.rs: simple_body empty body → CRLF (same rule) - parse.rs: enforce that h= includes the From header per RFC 6376 §5.4 (without this, DMARC alignment can be defeated by a post-sign rewrite) - types.rs: doc comment now points at dmarc::EmailVerificationStatus rather than the misleading DkimVerifyResult DNSSEC (PR 1's code): - verify.rs: try ALL matching trust-anchor candidates before giving up. During KSK rollovers operators configure both the rolling-out and rolling-in keys; first-match early-exit could fail under the inactive one and never try the active one. - mod.rs: doc comment no longer claims the email-recovery stack has no HTTP outcalls (the DoH fallback module does). - iana-root-anchors-2026-05.json: corrected stale "_comment" that claimed the historical 19036 KSK was included. DMARC (PR 3's code): - parse.rs: enforce p= immediately after v=DMARC1 per RFC 7489 §6.3. Real-world records all do this; OpenDMARC and other reference parsers reject the alternative. - from_header.rs: honour backslash escapes inside quoted-string display names so `"Alice \"Ops, Inc\"" <a@e.com>` parses correctly. - verify.rs: removed empty placeholder test that asserted nothing; the e2e path is exercised by test_vectors::*. DoH (PR 4's code): - parser.rs: encode_name now rejects invalid names (label > 63, total > 255, empty labels) rather than silently truncating — truncation would change which name we ask for, which is a real correctness issue. - mod.rs: fetch_txt now enforces a label-anchored suffix match between `name` and `registered_domain` (defence-in-depth: a caller bug otherwise lets an allowlisted registered_domain authorise an outcall for an unrelated FQDN). - mod.rs: transform_doh now signals parse failure via HTTP status 422 + empty body instead of a sentinel string in the body. The prior sentinel could collide with a (legal-but-unusual) TXT payload and silently turn valid records into "malformed". - types.rs / quorum.rs / interface doc: fixed stale comments that said "three providers" or implied parallelism in the sync test helper (the production async path uses futures::future::join_all; run_quorum is sequential). - New error variants: DohError::InvalidName, NameOutsideRegisteredDomain. - New unit tests cover the new validation paths. Test counts: 408 → 414 (added 6). Note on the alignment.rs / no-PSL flag: the Copilot comment that relaxed alignment is "fail-open without a PSL" is correct in isolation but reflects a documented design decision. We deliberately don't ship a PSL in the canister (size, freshness, and trust-root concerns — see design doc §6.4). The mitigation is the dot-anchored suffix check that prevents `evilexample.com` from matching `example.com`. Left unchanged; will reply with the design rationale. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…iring (dfinity#3838) ## Summary First PR in the email-recovery stack (`docs/ongoing/email-recovery.md` §10 Phase 0). Lands a working RFC-4035-compliant DNSSEC verifier for caller-supplied DNS proof bundles, plus the trust-anchor wiring that drives it. PR 2 (DKIM verifier) and PRs 4–9 (storage + recovery methods) build on this. ## What's in this PR ### Verifier core - New `dnssec/` module under `src/internet_identity/src/`: - `types.rs` — `DnsProofBundle`, `SignedRRset`, `DelegationLink`, `Rrsig`, `DnsName`, `DnssecError`, `VerifiedRecord`. - `canonical.rs` — owner-name canonicalization, RR canonical form, RRSIG signed-data construction (RFC 4034 §3.1.8.1, §6.2, §6.3), DS digest input. - `signature.rs` — algorithm dispatch + DS digest matching (SHA-256). - `verify.rs` — four-step algorithm (root anchor match → chain walk → leaf RRSIG → freshness). - Algorithm coverage (RFC 8624 MUST set): - **alg 8** — RSA-SHA256 (RFC 5702): root, com., most legacy zones. - **alg 13** — ECDSA-P256-SHA256 (RFC 6605): most TLDs, Cloudflare, modern zones. - **alg 15** — Ed25519 (RFC 8080): rare in production but mandatory. - Anything else returns `UnsupportedAlgorithm`. ### Wiring - New `DnssecConfig` and `DnssecRootAnchor` types in `internet_identity_interface`, exposed at the top of `InternetIdentityInit` as `dnssec_config: opt opt DnssecConfig` (set/clear semantics matching `analytics_config` and `dummy_auth`). - Trust-anchor list plumbed through `init`/`post_upgrade` into `PersistentState.dnssec_config` (and `StorablePersistentState` for cross-upgrade persistence). - `internet_identity.did` updated. ### Tests 13 unit tests in `dnssec/` covering: - Real cloudflare.com chain verifies end-to-end (exercises alg 8 at root and alg 13 at com → cloudflare.com → leaf). - Empty trust-anchor list rejected with `NoTrustAnchors`. - Wrong trust anchor rejected with `RootAnchorMismatch`. - Flipped byte in root DNSKEY → `RootAnchorMismatch` or `BadSignature`. - Flipped byte in leaf signature → `BadSignature`. - Stale signature (clock advanced past expiration) → `StaleOrFutureSignature`. - Unsupported algorithm (alg 5 / RSA-SHA1) → `UnsupportedAlgorithm(5)`. - Plus canonical-encoding + RFC 3110 RSA key parsing unit tests. ### Test infrastructure - `test_vectors/dnssec/cloudflare-com-2026-05.json` — real DoH-captured chain (root DNSKEY + 2 delegation links + leaf TXT). - `test_vectors/dnssec/iana-root-anchors-2026-05.json` — IANA root KSK trust anchors (Klajeyz/2017 + Kmyv6jo/2024). - `scripts/capture-dnssec-chain.py` — reproducible capture script using dnspython + DoH wire format. Tests use a frozen now from the capture's metadata so freshness checks stay stable indefinitely. ### New deps - `domain` (NLnet Labs, pure Rust) — referenced in docstrings for canonicalisation primitives; signature verification is hand-rolled on top of RustCrypto. - `p256` — ECDSA P-256 verification. - `ed25519-dalek` — Ed25519 verification. All three build cleanly for wasm32-unknown-unknown. ## What's deferred to later PRs in the stack - Captures for additional providers (proton.me, protonmail.com, tutanota.com — gmail.com / icloud.com / outlook.com / fastmail.com don't sign with DNSSEC; this is acknowledged in design doc §7.6). - Synthetic Ed25519 (alg 15) test vector — most production zones are alg 8 or 13; alg 15 is structurally exercised by the dispatch logic but doesn't have a real captured chain in this PR. ## Test plan - [x] `cargo check -p internet_identity --target wasm32-unknown-unknown` — clean (no warnings). - [x] `cargo test -p internet_identity --bin internet_identity` — 238 tests pass (was 227 pre-PR). - [x] `cargo test -p internet_identity_interface --lib` — 42 tests pass. - [x] `cargo clippy -p internet_identity --bin internet_identity --tests -- -D warnings` — clean. - [x] `cargo fmt --check` — clean (modulo a pre-existing diff in attributes.rs unrelated to this PR). - [x] CI on dfinity#3838 — fully green. ## Design doc https://github.com/sea-snake/internet-identity/blob/design/email-recovery/docs/ongoing/email-recovery.md (PR pending review on dfinity/internet-identity) ## Stack This is PR 1 of a 12-PR series. Subsequent PRs: - **PR 2** — mail-auth-backed DKIM verifier, consuming DnsProofBundle from this PR. - **PR 3** — DMARC alignment. - **PRs 4–9** — storage + Candid + behavior for email recovery. - **PRs 10–12** — frontend (DoH walker, Manage UI, recovery wizard). ## PR Stack | # | PR | Description | Status | |---|---|---|---| | 0 | [dfinity#3836](dfinity#3836) | Design doc | Open | | 1 | [dfinity#3838](dfinity#3838) | DNSSEC verifier scaffold | Open | | 2 | [dfinity#3839](dfinity#3839) | DKIM verifier (RFC 6376) | Open | | 3 | [dfinity#3840](dfinity#3840) | DMARC alignment (RFC 7489) | Open | | 4 | [dfinity#3841](dfinity#3841) | DoH fallback | Open | | 5+6 | [dfinity#3842](dfinity#3842) | Setup flow (storage + smtp_request) | Open | | 7 | [dfinity#3843](dfinity#3843) | Recovery flow (delegation) | Open | | 8 | [dfinity#3844](dfinity#3844) | Frontend + feature flag | Open | | 9 | [dfinity#3855](dfinity#3855) | Deploy/upgrade scripts: dnssec_config + doh_config | Open | | 10 | [dfinity#3857](dfinity#3857) | Email-recovery UX overhaul | Open |
Summary
PR 3 of the email-recovery stack (
docs/ongoing/email-recovery.md§6). Stacks on top of #3839 (DKIM verifier). Lands a hand-rolled DMARC alignment check and reshapes the verifier API:dkim::verify_dkimbecomes a DKIM-only primitive, and the newdmarc::verify_emailis the public top-level entry point that produces the combinedEmailVerificationStatus.Note: This PR targets
mainbut includes PRs 1+2's commits as its base. Review the DMARC-specific changes by looking at commits on top ofec371aae3(PR 2's tip). Once PRs 1+2 merge, this PR's diff shrinks to just the DMARC additions.What's in this PR
src/internet_identity/src/dmarc/types.rs—DmarcOutcome(Aligned / Misaligned / NoRecord / Malformed),DmarcPolicy(None / Quarantine / Reject),AlignmentMode(Strict / Relaxed),DmarcRecord, plus the combinedEmailVerificationStatusthat carries both DKIM diagnostics and the DMARC outcome on success.parse.rs(RFC 7489 §6.3) — DMARC TXT record parser. Enforcesv=DMARC1must be first,p=must be one of {none, quarantine, reject},pct=0..=100, rejects duplicate tags, ignores unknown / reporting tags. 12 unit tests.from_header.rs(RFC 5322 / RFC 7489 §3.1.1) — single-mailbox From-header parser. Accepts bare addr-spec, name-addr, and quoted-display-name forms; rejects zero/multiple From: headers, address-lists, group syntax. Tolerates comma/colon inside quoted display names. 16 unit tests.alignment.rs— strict (exact match) + relaxed (exact match OR label-aligned subdomain in either direction). Stricter than RFC-compliant relaxed alignment because we deliberately don't consult the PSL — design doc §6.4 documents the trust + asymmetric-failure-mode reasoning. The dot anchor on the subdomain check preventsevilexample.comfrom aliasingexample.com. 8 unit tests.verify.rs— orchestration. DKIM first; on failure, surface the DKIM reason verbatim. On DKIM pass, parse From and check DMARC alignment. Accepted iff Aligned, OR NoRecord withdkim_domain == from_domain. 8 unit tests.test_vectors.rs— 5 end-to-end tests reusing PR 2's synthetic .eml fixtures.src/internet_identity/src/dkim/types.rs(rename + new variants)EmailVerificationStatus→DkimVerifyResult(DKIM-only). The combined verdict moved todmarc::EmailVerificationStatusso it can carry theDmarcOutcome.MalformedFromHeader(String),DmarcMalformed(String),DmarcMisalignedtoVerificationFailReason.src/internet_identity/src/dkim/mod.rsverifyasverify_dkimso downstream callers (the dmarc layer) don't have to deal with both adkim::verifyanddmarc::verifyin scope at the same time.Test plan
cargo check -p internet_identity --target wasm32-unknown-unknown— clean.cargo test -p internet_identity --bin internet_identity dmarc— 49 tests pass (12 parse + 16 from_header + 8 alignment + 8 verify + 5 e2e).cargo test -p internet_identity --bin internet_identity— 365 tests pass total (was 313 with PR 2; +49 dmarc + 3 small reshape adjustments).cargo clippy -p internet_identity --bin internet_identity --tests -- -D warnings— clean.cargo fmt --check— clean (modulo pre-existing unrelated diffs).PR Stack