Skip to content

feat(doh): DoH fallback for DKIM/DMARC TXT records on unsigned domains#3879

Open
sea-snake wants to merge 8 commits into
mainfrom
feat/doh-fallback
Open

feat(doh): DoH fallback for DKIM/DMARC TXT records on unsigned domains#3879
sea-snake wants to merge 8 commits into
mainfrom
feat/doh-fallback

Conversation

@sea-snake
Copy link
Copy Markdown
Contributor

@sea-snake sea-snake commented May 12, 2026

Summary

PR 4 of the email-recovery stack. Adds the DoH (DNS-over-HTTPS) fallback path for fetching DKIM/DMARC TXT records when a domain isn't DNSSEC-signed (Gmail, Outlook, iCloud, etc.). Stacks on #3878 (DMARC alignment).

  • Allowlist-gated outcalls. DohConfig.allowed_domains (deploy/upgrade arg, persisted in PersistentState). Domains not on the list return DomainNotAllowed without touching the network.
  • 3-of-5 quorum across 4 jurisdictions. Cloudflare 🇺🇸 + Google 🇺🇸 + Quad9 🇨🇭 + CIRA 🇨🇦 + IIJ 🇯🇵. Strict majority; no single jurisdiction can reach quorum on its own (asserted in unit tests).
  • In-flight dedup. Hand-rolled oneshot-style Waker primitive collapses concurrent fetches for the same FQDN to one outcall fan-out.
  • IC-correct outcall path. http_request_with_closure with a transform query function that reduces the wire DNS response to its TXT RDATA (drops TTL/transaction-ID drift across replicas, so consensus succeeds). max_response_bytes = 4 KiB, cycle budget mirrors the OIDC discovery outcall.
  • Cache hygiene. Stale-pending TTL (PENDING_STALE_AFTER_SECS = 120) recovers from the IC-specific hazard where a trapped post-yield continuation leaves an InFlight entry stuck forever (state across .await is committed, not transactional). Lazy eviction on lookup + opportunistic sweep on publish bound the value-cache size to {unique FQDNs queried in max_cache_age_secs}.

Public API: pub async fn fetch_txt(name, registered_domain) -> Result<Vec<u8>, DohError>. Allowlist match is case-insensitive on the registered domain. Steady-state outcall load for a continuous Gmail flow at default 1h TTL: ~10 outcalls/hr per domain (2 fan-outs × 5 providers), independent of email volume.

PR 8 onwards will wire this into the DKIM/DMARC verifiers from PRs 2/3 alongside the DNSSEC chain from PR 1.

Test plan

  • 43 new doh unit tests: parser (round-trip, malformed, NXDOMAIN), bucketing (3-of-5/2-of-5 split-bucket/all-fail/all-malformed), dedup wakeups, stale-pending eviction wakes orphan waiters with AllProvidersFailed, late-publisher no-op after stale takeover, lazy eviction on lookup, sweep on publish, end-to-end fetch_txt with NotConfigured / DomainNotAllowed / case-insensitive allowlist / cache hit-then-refetch / TTL cap / quorum-failure-not-cached.
  • Full unit test suite green (408 tests).
  • cargo check --target wasm32-unknown-unknown clean.
  • cargo clippy --tests -- -D warnings clean.
  • cargo fmt --check clean.
  • Smoke test on local replica: fetch_txt against a real allowlisted domain, verify provider fan-out + transform + consensus end-to-end (deferred to integration testing once gateway plumbing lands in PR 8).
  • Verify IIJ DoH endpoint URL (public.dns.iij.jp) with a real outcall before deploy.

PR Stack

# PR Description Status
0 #3836 Design doc Open
1 #3838 DNSSEC verifier scaffold Open
2 #3877 DKIM verifier (RFC 6376) Open
3 #3878 DMARC alignment (RFC 7489) Open
4 #3879 DoH fallback Open
5+6 #3880 Setup flow (storage + smtp_request) Open
7 #3881 Recovery flow (delegation) Open
8 #3882 Frontend + feature flag Open
9 #3883 Deploy/upgrade scripts: dnssec_config + doh_config Open
10 #3884 Email-recovery UX overhaul Open

@sea-snake sea-snake requested a review from a team as a code owner May 12, 2026 11:52
@sea-snake sea-snake force-pushed the feat/dmarc-alignment branch from 42ee57d to a252969 Compare May 12, 2026 12:00
@sea-snake sea-snake force-pushed the feat/doh-fallback branch from c14b0c9 to 5d5f686 Compare May 12, 2026 12:00
sea-snake added a commit that referenced this pull request May 12, 2026
Three fixes originally landed bundled with DoH review feedback
(commit 5d5f686 on PR #3879). They belong here in PR 3's DMARC
module since they're parser / spec-compliance fixes:

- parse.rs: enforce p= immediately after v=DMARC1 per
  RFC 7489 §6.3. Real-world records all do this; OpenDMARC and
  other reference parsers reject the alternative.
- from_header.rs: honour backslash escapes inside quoted-string
  display names so `"Alice \"Ops, Inc\"" <a@e.com>` parses
  correctly.
- verify.rs: removed empty placeholder test that asserted
  nothing; the e2e path is exercised by test_vectors::*.
@sea-snake sea-snake force-pushed the feat/dmarc-alignment branch from a252969 to 15811ad Compare May 12, 2026 12:24
@sea-snake sea-snake force-pushed the feat/doh-fallback branch from 5d5f686 to 3a8e9a4 Compare May 12, 2026 12:24
@sea-snake sea-snake force-pushed the feat/dmarc-alignment branch from 15811ad to 9208444 Compare May 12, 2026 13:01
@sea-snake sea-snake force-pushed the feat/doh-fallback branch from 3a8e9a4 to 33b1582 Compare May 12, 2026 13:01
aterga added a commit that referenced this pull request May 12, 2026
Addresses the four "easy-to-change improvements" from Grok's review on
#3877
(#3877 (review)).
Targets `feat/dkim-verifier` so it can be merged in as a follow-up
commit to that PR.

## What changed

1. **`types.rs` — `NoSignature` doc-comment.** The distinct variant Grok
asked for (`NoSignaturePresent` or equivalent) already exists as
`VerificationFailReason::NoSignature` at types.rs:84 and is returned
from `verify()` when no DKIM-Signature headers are present
(verify.rs:54, verify.rs:67). The frontend can already distinguish it
from a generic failure. Expanded the doc-comment to make the UX intent
("this provider doesn't use DKIM") explicit so future readers (and AI
reviewers) don't re-flag this.

2. **`parse.rs` — explicit unknown-tag-ignored test.** RFC 6376 §3.5
requires implementations to ignore unrecognised tags. The parser already
does this (it's lookup-by-name in `split_tag_list` → `get(name)`;
unknown tags pass straight through — the existing `happy_value()`
fixture even contains an unknown `q=dns/txt`). Pinned the behaviour with
a dedicated `unknown_tags_are_ignored` test covering `z=` and a
synthetic `zz=`.

3. **`verify.rs` — SECURITY comment at the trust boundary.** Added an
inline `// SECURITY:` block at the `parse_dkim_txt` call site stating
that `dkim_txt` is trusted, sourced from the DNSSEC verifier (#3838) or
pinned-host DoH outcall (#3879).

## Not actioned

- **CI job rename (`dkim-test-vectors` → `dkim-verifier-tests`).** No
such job exists in `.github/workflows/`; PR #3877 doesn't add one.
Nothing to rename.
- **`main.rs::handle_email_recovery` match arm.** That endpoint hasn't
been added yet (it's PRs 5–7 of the stack). No-op for this PR.

## Tests

- `cargo test -p internet_identity --bin internet_identity dkim::` — 79
pass (was 78, +1 for the new unknown-tag test).
- `cargo clippy -p internet_identity --bin internet_identity --tests --
-D warnings` — clean.
- `cargo check -p internet_identity --target wasm32-unknown-unknown` —
clean.
- `cargo fmt --check` on the three modified files — clean (other
pre-existing fmt drifts in `dnssec/`, `openid/`, `tests/integration/`
are unrelated and predate this branch).


---
_Generated by [Claude
Code](https://claude.ai/code/session_01NZmvbHgzN5NQc7Hqx9ZzpP)_

Co-authored-by: Claude <noreply@anthropic.com>
sea-snake-translation-bot pushed a commit to sea-snake-translation-bot/internet-identity that referenced this pull request May 12, 2026
…ck) (dfinity#3877)

## Summary

PR 2 of the email-recovery stack (`docs/ongoing/email-recovery.md` §10
Phase 0). Stacks on top of PR 3838 (DNSSEC verifier). Lands a
hand-rolled RFC 6376 DKIM verifier that consumes a parsed `SmtpRequest`
plus an already-trusted DKIM TXT record and returns a per-step
`EmailVerificationStatus`.

**Note:** This PR targets `main` but includes PR 3838's commits (DNSSEC
verifier) as its base. Review the DKIM-specific changes by looking at
commits after `9bbd8717` (the last PR 3838 commit). Once PR 3838 merges,
this PR's diff will shrink to just the DKIM additions.

## Why hand-rolled

The design originally specified `mail-auth` (Stalwart's well-tested DKIM
library), but mail-auth pulls a non-optional `hickory-resolver` dep that
fails to compile for `wasm32-unknown-unknown` (transitive: tokio + mio).
Forking + patching mail-auth would be possible but creates perpetual
rebase burden. We hand-roll instead — "the right way, no shortcuts" was
the explicit guidance.

## What's in this PR

###
`src/internet_identity_interface/src/internet_identity/types/smtp.rs`
Brings forward the SMTP gateway protocol types from PoC PR 3760:
`SmtpRequest`/`SmtpResponse`/`SmtpHeader`/`SmtpMessage`/`SmtpAddress`/`SmtpEnvelope`,
the size bounds, and the input-bound validation (`format_address`
lowercases both halves; `truncate_at_char_boundary` clamps to the
previous UTF-8 boundary so a multi-byte subject can't trap the
canister). Drops postbox-specific bits (PostboxEmail,
ValidatedSmtpRequest, anchor-number parser).

### `src/internet_identity/src/dkim/`
- **`types.rs`** — Algorithm (RsaSha256, Ed25519Sha256),
HeaderCanon/BodyCanon (Relaxed, Simple),
DkimCheck/DkimCheckName/DkimCheckStatus per-step diagnostics,
EmailVerificationStatus / VerificationFailReason result shape.
- **`parse.rs`** (RFC 6376 §3.5) — DKIM-Signature header tag-list
parser. Splits structurally on `;` first then on the *first* `=` per
element, so a literal `b=` substring inside another tag's base64 doesn't
get misread as a new tag start (the bug class the PoC PR review
specifically flagged). Folded whitespace inside base64 values is
stripped before decoding. Tag names case-insensitive; duplicates
rejected.
- **`canonicalize.rs`** (§3.4.2 / §3.4.4) — relaxed header canon
(lowercase name, unfold continuations, collapse WSP+ to single SP, strip
trailing WSP, strip WSP around colon) and relaxed body canon (per-line
WSP cleanup, drop trailing empty lines, ensure non-empty output ends in
exactly one CRLF).
- **`dns_record.rs`** (§3.6.2) — DKIM TXT record parser. Tag names
case-insensitive (`P=` vs `p=` was a PoC bug), whitespace inside `p=`
tolerated (multi-chunk DNS TXT records), `t=y`/`t=s` flags honoured,
unknown tags ignored.
- **`signature.rs`** — RSA-SHA256 (RFC 5702 / RFC 8301) and
Ed25519-SHA256 (RFC 8463) signature verification on top of
`rsa`+`sha2`+`ed25519-dalek` from PR 1's deps. Enforces 1024-bit RSA
minimum per design §5.6. Ed25519 path wraps in SHA-256 per RFC 8463.
Plus `body_hash_sha256` with optional `l=` truncation per §3.4.5.
- **`verify.rs`** — orchestration. Multi-signature loop per §5.5 (accept
on first pass), tag enforcement per design §5.4 (c=relaxed/* only, x=
expiration, i= alignment with d=, k= match, t=y testing-mode), bottom-up
header selection per §5.4 when h= lists a name multiple times, b=value
blanking that's structural-position-aware so it doesn't mis-target an
internal substring.
- **`test_vectors.rs`** — `#[cfg(test)]` .eml loader + 8 end-to-end
tests against committed fixtures.

### `test_vectors/dkim/`
- 3 synthetic .eml files generated offline with dkimpy + a 2048-bit RSA
key (`relaxed/relaxed`, `relaxed/simple`, `simple/simple`).
- The matching DKIM TXT record (public key only).
- README documenting provenance — the throwaway private key is **not**
committed.

## Test plan

- [x] `cargo check -p internet_identity --target wasm32-unknown-unknown`
— clean.
- [x] `cargo test -p internet_identity --bin internet_identity dkim` —
75 tests pass (parse 14, canonicalize 18, dns_record 16, signature 7,
verify 12, end-to-end 8).
- [x] `cargo test -p internet_identity --bin internet_identity` — 313
tests pass total (was 238 before this PR; +75 DKIM, plus a few in smtp
types).
- [x] `cargo test -p internet_identity_interface --lib` — 52 tests pass
(was 42; +10 SMTP type tests).
- [x] `cargo clippy -p internet_identity --bin internet_identity --tests
-- -D warnings` — clean.
- [x] `cargo fmt --check` — clean (modulo pre-existing diffs unrelated
to this PR).

## Stack

This is PR 2 of a 12-PR series. Includes PR 3838's commits as its base;
once PR 3838 merges, the diff shrinks to just the DKIM additions.

Subsequent PRs:
- **PR 3** — DMARC alignment.
- **PR 4** — DoH outcall fallback for unsigned domains (Gmail / Outlook
/ iCloud — see the design doc §7.6 and the team Slack writeup).
- **PRs 5–9** — storage + Candid + behavior for email recovery.
- **PRs 10–12** — frontend.

## PR Stack
| # | PR | Description | Status |
|---|---|---|---|
| 0 | [dfinity#3836](dfinity#3836) |
Design doc | Open |
| 1 | [dfinity#3838](dfinity#3838) |
DNSSEC verifier scaffold | Open |
| 2 | [dfinity#3877](dfinity#3877) |
DKIM verifier (RFC 6376) | Open |
| 3 | [dfinity#3878](dfinity#3878) |
DMARC alignment (RFC 7489) | Open |
| 4 | [dfinity#3879](dfinity#3879) |
DoH fallback | Open |
| 5+6 | [dfinity#3880](dfinity#3880)
| Setup flow (storage + smtp_request) | Open |
| 7 | [dfinity#3881](dfinity#3881) |
Recovery flow (delegation) | Open |
| 8 | [dfinity#3882](dfinity#3882) |
Frontend + feature flag | Open |
| 9 | [dfinity#3883](dfinity#3883) |
Deploy/upgrade scripts: dnssec_config + doh_config | Open |
| 10 | [dfinity#3884](dfinity#3884) |
Email-recovery UX overhaul | Open |

---------

Co-authored-by: Arshavir Ter-Gabrielyan <arshavir.ter.gabrielyan@dfinity.org>
Co-authored-by: Claude <noreply@anthropic.com>
sea-snake and others added 5 commits May 13, 2026 10:41
PR 3 of the email-recovery stack (docs/ongoing/email-recovery.md §6).
Stacks on top of PR 2 (DKIM verifier) and reshapes the verifier API:
dkim::verify_dkim now returns a DKIM-only DkimVerifyResult, and the new
dmarc::verify_email is the public top-level entry point that produces
the combined EmailVerificationStatus.

New module src/internet_identity/src/dmarc/:
- types.rs: DmarcOutcome (Aligned / Misaligned / NoRecord / Malformed),
  DmarcPolicy, AlignmentMode, DmarcRecord, plus the combined
  EmailVerificationStatus that carries both the DKIM diagnostic and
  the DMARC outcome on success.
- parse.rs: DMARC TXT record parser per RFC 7489 §6.3. Enforces
  v=DMARC1 must be first, p= must be one of {none, quarantine, reject},
  pct= 0..=100, rejects duplicate tags, ignores unknown tags. 12 unit
  tests.
- from_header.rs: RFC 5322 single-mailbox From: parser. Accepts
  bare addr-spec, name-addr, and quoted-display-name forms; rejects
  zero/multiple From: headers, address-lists, group syntax. Tolerates
  comma/colon inside quoted display names. 16 unit tests.
- alignment.rs: strict (exact match) + relaxed (exact match OR
  label-aligned subdomain in either direction). Stricter than RFC-
  compliant relaxed alignment because we deliberately don't consult
  the PSL — see design doc §6.4. The dot anchor on the subdomain check
  prevents 'evilexample.com' from aliasing 'example.com'. 8 unit tests.
- verify.rs: orchestration. DKIM first; on failure, surface the DKIM
  reason verbatim. On DKIM pass, parse From and check DMARC alignment.
  Accepted iff Aligned, OR NoRecord with dkim_domain == from_domain.
  8 unit tests.

dkim/types.rs:
- Renamed EmailVerificationStatus -> DkimVerifyResult (DKIM-only).
  The combined verdict moved to dmarc::EmailVerificationStatus so it
  can carry the DmarcOutcome.
- Added MalformedFromHeader, DmarcMalformed, DmarcMisaligned to
  VerificationFailReason.

44 DMARC tests pass (12 parse + 16 from_header + 8 alignment + 8 verify),
on top of the existing 78 DKIM tests. Wasm32 build clean.
5 end-to-end tests in dmarc/test_vectors.rs reusing the synthetic .eml
fixtures from PR 2 (alice@test.example.com signed with d=test.example.com,
exact match → trivially aligned regardless of mode). Covers:
- no DMARC record + dkim==from → Verified(NoRecord)
- aligned DMARC strict → Verified(Aligned, Strict)
- aligned DMARC relaxed (default) → Verified(Aligned, Relaxed)
- malformed DMARC TXT → Unverified(DmarcMalformed)
- From: with address-list → Unverified

49 dmarc tests total (44 unit + 5 e2e). 365 total in the II suite. The
parse_eml helper is duplicated from dkim/test_vectors.rs because the
original is #[cfg(test)] and not pub-visible across modules; the
duplication is contained to test code.
Three fixes originally landed bundled with DoH review feedback
(commit 5d5f686 on PR #3879). They belong here in PR 3's DMARC
module since they're parser / spec-compliance fixes:

- parse.rs: enforce p= immediately after v=DMARC1 per
  RFC 7489 §6.3. Real-world records all do this; OpenDMARC and
  other reference parsers reject the alternative.
- from_header.rs: honour backslash escapes inside quoted-string
  display names so `"Alice \"Ops, Inc\"" <a@e.com>` parses
  correctly.
- verify.rs: removed empty placeholder test that asserted
  nothing; the e2e path is exercised by test_vectors::*.
Addresses the two actionable "easy-to-change improvements" from Grok's
review on #3878
(#3878 (comment)).
Targets `feat/dmarc-alignment` so it can land as a follow-up commit on
that PR.

(Tried to push directly to `feat/dmarc-alignment` per the maintainer's
note from #3877 / #3885, but my account got a 403 against that branch —
falling back to a stacked draft PR.)

## What changed

1. **`dmarc/mod.rs` — consolidated `# Security model` docblock.** Calls
out the three deliberate deviations from "stock" DMARC, each with a
one-paragraph rationale:
- **No Public Suffix List** — label-anchored suffix check prevents
`evilexample.com`-style spoofs at the cost of closing multi-domain orgs
(`gmail.com` ↔ `googlemail.com`). Safe direction for a recovery surface.
Design doc §6.4.
- **No SPF** — recovery proves mailbox control, not path-of-delivery.
DKIM gives the cryptographic binding; SPF would need a source IP the
gateway payload doesn't carry. Design doc §6.5.
- **Fail-closed everywhere** — every malformed/unknown step collapses to
`Unverified`. No quarantine, no downgrade.

2. **`dmarc/test_vectors.rs` —
`verifies_end_to_end_when_dmarc_record_has_unknown_tags`.** Runs the
full `verify_email` against a DMARC record carrying `rua=`, `ruf=`,
`fo=`, and a synthetic `vendorext=` tag. Pins the ignore-unknown-tags
behaviour through the public entry point, not just the parser unit
(which already has a `unknown_tags_are_ignored` test in
`dmarc/parse.rs`).

## Not actioned

- **Grok item #1 ("parser must ignore unknown tags").** Already
implemented and unit-tested before this PR. `dmarc/parse.rs` is
lookup-by-name, so unknown tags pass through silently. The PR
description even calls this out: "ignores unknown / reporting tags". The
new end-to-end test is the explicit confirmation Grok asked for under
item #3.

The four strategic recommendations (observability metrics, frontend
error surface, SPF wiring, README) are explicitly "nice-to-have" /
"future" and belong in later PRs of the stack.

## Tests

- `cargo test -p internet_identity --bin internet_identity dmarc::` — 49
pass (was 48, +1 e2e unknown-tag test).
- `cargo clippy -p internet_identity --bin internet_identity --tests --
-D warnings` — clean.
- `cargo check -p internet_identity --target wasm32-unknown-unknown` —
clean.
- `cargo fmt --check` on the two touched files — clean (other
pre-existing fmt drifts in `dnssec/`, `openid/`, `tests/integration/`,
`types/attributes.rs` are unrelated and predate this branch).


---
_Generated by [Claude
Code](https://claude.ai/code/session_01NZmvbHgzN5NQc7Hqx9ZzpP)_

Co-authored-by: Claude <noreply@anthropic.com>
Addresses the five inline Copilot review comments across #3878 and its
rebased twin #3887.

dmarc/from_header.rs
- Reject internal whitespace in the bare addr-spec form (no angle
  brackets). Previously `"Alice alice@example.com"` and
  `"alice @example.com"` slipped through with domain=example.com because
  only the post-`@` domain part was whitespace-checked.
- Reject `>` without a matching `<`, multiple `>`, and content after the
  closing `>`. Previously `alice@example.com>` was parsed as a bare
  addr-spec with the `>` ending up in the extracted domain.
- 5 new regression tests covering the above.

dmarc/verify.rs
- The `email.message == None` branch after a successful DKIM is dead in
  practice (verify_dkim returns Unverified(NoSignature) otherwise), but
  we keep the defensive branch rather than .expect()/unreachable!() —
  canister code on the IC must never trap, an invariant violation should
  surface as a structured fail-closed verdict instead. Updated the
  inline comment and switched the placeholder reason from the misleading
  MalformedFromHeader to NoSignature.

dkim/verify.rs
- On a successful signature, return only that signature's per-step
  checks in DkimVerifyResult::Verified.checks, matching the doc
  ("checks for the winning signature"). Previously the verifier
  concatenated every failed attempt's checks onto the winner's.
- Update the module-level doc snippet to reference `verify_dkim` (the
  re-exported public name) instead of `verify`.
@sea-snake sea-snake force-pushed the feat/dmarc-alignment branch from 271edb2 to 7d157bd Compare May 13, 2026 10:53
sea-snake and others added 3 commits May 13, 2026 11:05
Lays the foundation for PR 4 (DoH outcall fallback for unsigned
domains, design doc §7.6). Two pieces in this commit:

1. Wiring layer:
   - new DohConfig type in internet_identity_interface (allowed_domains,
     max_cache_age_secs)
   - new InternetIdentityInit.doh_config: opt opt DohConfig (set/clear
     pattern, like dnssec_config)
   - new PersistentState.doh_config + StorablePersistentState mirror
   - apply_install_arg + config() round-trip wired
   - .did file updated; existing 365-test bin suite still passes

2. doh/ module scaffold:
   - parser.rs: minimal wire-format DNS message helpers — build_txt_query
     constructs a query for <name> IN TXT, parse_txt_response walks the
     answer section and extracts the TXT RDATA character-strings
     concatenated. Compression-pointer skipping has a hop cap so a
     malicious response can't loop us. 9 unit tests.
   - types.rs: DohProvider + the three providers we'll quorum across
     (Quad9 CH, CIRA Canadian Shield CA, Cloudflare US — three
     jurisdictions, three independent operators, all running as long-
     term public services), QUORUM_THRESHOLD=2, default + cap on cache
     age. DohError enum. 4 unit tests.

Cache, quorum, and the public fetch_txt API land in subsequent commits.
Wires up the body of the DoH fallback for DKIM/DMARC TXT records on
unsigned domains:

- Wire-format DNS query/parse (parser.rs)
- 3-of-5 quorum across Cloudflare/Google/Quad9/CIRA/IIJ — strict
  majority across 4 jurisdictions; no single jurisdiction can reach
  quorum on its own (quorum.rs)
- Heap cache with concurrent-fetch dedup via a hand-rolled Waker
  primitive; PENDING_STALE_AFTER_SECS guards against committed-state
  hangs from a trapped post-await continuation; lazy eviction on
  lookup + opportunistic sweep on publish bound cache size (cache.rs)
- Public `fetch_txt(name, registered_domain)` async API: allowlist
  gate, cache lookup, parallel outcalls via futures::future::join_all,
  quorum decision, publish (mod.rs)
- Production outcall path uses ic_cdk's http_request_with_closure with
  a transform query function that reduces wire responses to TXT bytes
  for replica consensus (TTLs in raw responses drift across replicas)

Test coverage: 43 unit tests covering parser, bucketing, dedup,
stale-pending eviction, sweep-on-publish, and end-to-end fetch_txt
flow including allowlist, cache TTL, MAX_CACHE_AGE_SECS cap, and
quorum-failure-not-cached.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The DKIM/DMARC portions of the original Copilot review commit moved
down to PRs 2 and 3 where those modules are introduced. What stays
here are the DoH-specific fixes:

- parser.rs: encode_name now rejects invalid names (label > 63,
  total > 255, empty labels) rather than silently truncating.
  Truncation would change which name we ask for, which is a real
  correctness issue.
- mod.rs: fetch_txt enforces a label-anchored suffix match between
  `name` and `registered_domain` (defence-in-depth: a caller bug
  otherwise lets an allowlisted registered_domain authorise an
  outcall for an unrelated FQDN).
- mod.rs: transform_doh signals parse failure via HTTP status 422
  + empty body instead of a sentinel string in the body. The prior
  sentinel could collide with a (legal-but-unusual) TXT payload and
  silently turn valid records into "malformed".
- types.rs / quorum.rs / interface doc: fixed stale comments that
  said "three providers" or implied parallelism in the sync test
  helper (the production async path uses futures::future::join_all;
  run_quorum is sequential).
- New error variants: DohError::InvalidName, NameOutsideRegisteredDomain.
- New unit tests cover the new validation paths.

Note on the alignment.rs / no-PSL flag: the Copilot comment that
relaxed alignment is "fail-open without a PSL" is correct in isolation
but reflects a documented design decision. We deliberately don't ship
a PSL in the canister (size, freshness, and trust-root concerns — see
design doc §6.4). The mitigation is the dot-anchored suffix check that
prevents `evilexample.com` from matching `example.com`. Left unchanged;
will reply with the design rationale.
@sea-snake sea-snake force-pushed the feat/doh-fallback branch from 33b1582 to 071f85c Compare May 13, 2026 11:25
Base automatically changed from feat/dmarc-alignment to main May 13, 2026 14:08
aterga added a commit that referenced this pull request May 13, 2026
…of email-recovery stack) (#3878)

## Summary

PR 3 of the email-recovery stack (`docs/ongoing/email-recovery.md` §6).
Stacks on top of #3877 (DKIM verifier). Lands a hand-rolled DMARC
alignment check and reshapes the verifier API: `dkim::verify_dkim`
becomes a DKIM-only primitive, and the new `dmarc::verify_email` is the
public top-level entry point that produces the combined
`EmailVerificationStatus`.

**Note:** This PR targets `main` but includes PRs 1+2's commits as its
base. Review the DMARC-specific changes by looking at commits on top of
`ec371aae3` (PR 2's tip). Once PRs 1+2 merge, this PR's diff shrinks to
just the DMARC additions.

## What's in this PR

### `src/internet_identity/src/dmarc/`

- **`types.rs`** — `DmarcOutcome` (Aligned / Misaligned / NoRecord /
Malformed), `DmarcPolicy` (None / Quarantine / Reject), `AlignmentMode`
(Strict / Relaxed), `DmarcRecord`, plus the combined
`EmailVerificationStatus` that carries both DKIM diagnostics and the
DMARC outcome on success.
- **`parse.rs`** (RFC 7489 §6.3) — DMARC TXT record parser. Enforces
`v=DMARC1` must be first, `p=` must be one of {none, quarantine,
reject}, `pct=` 0..=100, rejects duplicate tags, ignores unknown /
reporting tags. 12 unit tests.
- **`from_header.rs`** (RFC 5322 / RFC 7489 §3.1.1) — single-mailbox
From-header parser. Accepts bare addr-spec, name-addr, and
quoted-display-name forms; rejects zero/multiple From: headers,
address-lists, group syntax. Tolerates comma/colon inside quoted display
names. 16 unit tests.
- **`alignment.rs`** — strict (exact match) + relaxed (exact match OR
label-aligned subdomain in either direction). Stricter than
RFC-compliant relaxed alignment because we deliberately don't consult
the PSL — design doc §6.4 documents the trust + asymmetric-failure-mode
reasoning. The dot anchor on the subdomain check prevents
`evilexample.com` from aliasing `example.com`. 8 unit tests.
- **`verify.rs`** — orchestration. DKIM first; on failure, surface the
DKIM reason verbatim. On DKIM pass, parse From and check DMARC
alignment. Accepted iff Aligned, OR NoRecord with `dkim_domain ==
from_domain`. 8 unit tests.
- **`test_vectors.rs`** — 5 end-to-end tests reusing PR 2's synthetic
.eml fixtures.

### `src/internet_identity/src/dkim/types.rs` (rename + new variants)

- Renamed `EmailVerificationStatus` → `DkimVerifyResult` (DKIM-only).
The combined verdict moved to `dmarc::EmailVerificationStatus` so it can
carry the `DmarcOutcome`.
- Added `MalformedFromHeader(String)`, `DmarcMalformed(String)`,
`DmarcMisaligned` to `VerificationFailReason`.

### `src/internet_identity/src/dkim/mod.rs`

- Re-exports `verify` as `verify_dkim` so downstream callers (the dmarc
layer) don't have to deal with both a `dkim::verify` and `dmarc::verify`
in scope at the same time.

## Test plan

- [x] `cargo check -p internet_identity --target wasm32-unknown-unknown`
— clean.
- [x] `cargo test -p internet_identity --bin internet_identity dmarc` —
49 tests pass (12 parse + 16 from_header + 8 alignment + 8 verify + 5
e2e).
- [x] `cargo test -p internet_identity --bin internet_identity` — 365
tests pass total (was 313 with PR 2; +49 dmarc + 3 small reshape
adjustments).
- [x] `cargo clippy -p internet_identity --bin internet_identity --tests
-- -D warnings` — clean.
- [x] `cargo fmt --check` — clean (modulo pre-existing unrelated diffs).

## PR Stack
| # | PR | Description | Status |
|---|---|---|---|
| 0 | [#3836](#3836) |
Design doc | Open |
| 1 | [#3838](#3838) |
DNSSEC verifier scaffold | Open |
| 2 | [#3877](#3877) |
DKIM verifier (RFC 6376) | Open |
| 3 | [#3878](#3878) |
DMARC alignment (RFC 7489) | Open |
| 4 | [#3879](#3879) |
DoH fallback | Open |
| 5+6 | [#3880](#3880)
| Setup flow (storage + smtp_request) | Open |
| 7 | [#3881](#3881) |
Recovery flow (delegation) | Open |
| 8 | [#3882](#3882) |
Frontend + feature flag | Open |
| 9 | [#3883](#3883) |
Deploy/upgrade scripts: dnssec_config + doh_config | Open |
| 10 | [#3884](#3884) |
Email-recovery UX overhaul | Open |

---------

Co-authored-by: Arshavir Ter-Gabrielyan <arshavir.ter.gabrielyan@dfinity.org>
Co-authored-by: Claude <noreply@anthropic.com>
@aterga
Copy link
Copy Markdown
Collaborator

aterga commented May 13, 2026

PR Overview (pragmatic lens)
This is a clean, well-scoped addition: a DNS-over-HTTPS (DoH) fallback path for fetching DKIM/DMARC TXT records on domains that lack DNSSEC (the vast majority of consumer email providers). It is gated, resilient, IC-native, and low-overhead. The design choices (allowlist, 3-of-5 cross-jurisdiction quorum, transform-based deterministic outcalls, in-flight deduplication, and cache-with-stale-pending hygiene) directly address the real risks of outcalls on the Internet Computer.

43 new unit tests + clean cargo check/clippy/fmt + explicit test plan for local-replica smoke test in PR 8 make this low-risk to merge once the small items below are addressed. No breaking changes; everything is additive and configuration-gated.

Blockers (must-fix before merge)

None identified. The security posture is already stronger than many production outcall patterns in the IC ecosystem.

Major Security Gaps

None.

  • Outcall surface is strictly allowlist-gated on the registered domain (case-insensitive) before any network activity → DomainNotAllowed is returned with zero cycles spent.
  • Quorum (3-of-5 across Cloudflare 🇺🇸, Google 🇺🇸, Quad9 🇨🇭, CIRA 🇨🇦, IIJ 🇯🇵) plus the unit-test assertion that “no single jurisdiction can reach quorum on its own” eliminates single-provider or single-country poisoning.
  • Response canonicalization via http_request_with_closure + transform query function (drops TTL/transaction-ID variance) guarantees deterministic consensus.
  • Cache hygiene explicitly mitigates the IC-specific “trapped continuation after yield” hazard (state is committed, not rolled back).
  • Parsing/validation is defensive (invalid DNS names rejected, label-anchored suffix match between queried name and registered domain).

This is materially safer than a naïve http_request to arbitrary DoH endpoints.

Easy-to-Change Improvements (quick wins, <30 min each)

  1. Expose the provider list and DoH URLs as constants with comments (in doh/mod.rs or types.rs).

    • Action: Add a const PROVIDERS: &[(&str, &str)] (name + URL) and a short comment explaining the jurisdiction distribution and why two US providers are acceptable given the quorum rule.
    • Why: Reviewers/deployers will want to know exactly which endpoints are called; makes future rotation trivial.
  2. Make PENDING_STALE_AFTER_SECS (120 s) a named const at the top of cache.rs with a one-line comment referencing the IC yield hazard.

    • Also add a matching integration-test case that forces a continuation trap (or mocks it) to prove eviction works.
  3. Document the exact steady-state outcall cost in the PR description / module docs.

    • You already have it internally (“~10 outcalls/hr per domain at 1 h TTL”). Surface it as a comment on fetch_txt so operators know the cycle budget impact before enabling email recovery at scale.
  4. Add one explicit test for the allowlist edge case where the registered domain matches but the queried name does not suffix-match (should be rejected before outcall).

    • The defensive validation is mentioned; just make the test visible so future refactors can’t regress it.
  5. Tiny Cargo.toml hygiene (optional but nice): the new deps (futures alloc-only, url, identity_jose) are all justified by the stack, but add a one-line comment next to each explaining why it is pulled in by this PR (avoids “why is this here?” questions later).

Strategic Recommendations

  1. Rate-limit / back-pressure story (post-merge, before PR 8 wiring).
    Even though per-domain load is tiny, a popular anchor tenant using Gmail/Outlook could still generate bursts. Consider a simple global token-bucket (or just a last_outcall timestamp + cooldown) in a follow-up PR so one misbehaving recovery flow cannot starve the rest of the canister.

  2. Observability hooks.
    Expose a DohMetrics struct (or just increment counters via ic_cdk::api::call::call_perform counters if available) for:

    • cache hit/miss
    • quorum success/failure (and which providers disagreed)
    • allowlist denials
      This will be gold when email recovery goes live.
  3. Configuration defaults & upgrade safety.
    The DohConfig fields are optional in both upgrade arg and persistent state — perfect. In the next config PR (or in main.rs init), consider a sensible default allowlist (vec!["gmail.com".to_string(), "outlook.com".to_string(), ...]) or at least document the empty-list behaviour clearly so mainnet upgrades don’t silently disable the fallback.

  4. Long-term: consider a hybrid DNSSEC + DoH path (already planned in the stack).
    Once DNSSEC path (PR 1) is wired, the DoH path becomes the “unsigned” branch. Document the decision matrix (DNSSEC → DoH → fail) in one place so the full email-recovery verification story is easy to audit.

  5. Testing strategy for the full stack.
    When PR 8 wires this in, add at least one end-to-end test that uses a real Gmail/Outlook test account (with known DKIM/DMARC records) against the local replica. The current unit-test coverage is excellent for the DoH layer; the integration surface is where subtle bugs hide.

Overall verdict: Ship after the five easy items above (mostly documentation + one test). This is a high-quality, defensively-written piece of infrastructure that materially unblocks email recovery for the most common real-world domains. Great work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants