Skip to content

feat(dkim): hand-rolled RFC 6376 verifier (PR 2 of email-recovery stack)#3877

Merged
aterga merged 11 commits into
mainfrom
feat/dkim-verifier
May 12, 2026
Merged

feat(dkim): hand-rolled RFC 6376 verifier (PR 2 of email-recovery stack)#3877
aterga merged 11 commits into
mainfrom
feat/dkim-verifier

Conversation

@sea-snake
Copy link
Copy Markdown
Contributor

@sea-snake sea-snake commented May 12, 2026

Summary

PR 2 of the email-recovery stack (docs/ongoing/email-recovery.md §10 Phase 0). Stacks on top of PR 3838 (DNSSEC verifier). Lands a hand-rolled RFC 6376 DKIM verifier that consumes a parsed SmtpRequest plus an already-trusted DKIM TXT record and returns a per-step EmailVerificationStatus.

Note: This PR targets main but includes PR 3838's commits (DNSSEC verifier) as its base. Review the DKIM-specific changes by looking at commits after 9bbd8717 (the last PR 3838 commit). Once PR 3838 merges, this PR's diff will shrink to just the DKIM additions.

Why hand-rolled

The design originally specified mail-auth (Stalwart's well-tested DKIM library), but mail-auth pulls a non-optional hickory-resolver dep that fails to compile for wasm32-unknown-unknown (transitive: tokio + mio). Forking + patching mail-auth would be possible but creates perpetual rebase burden. We hand-roll instead — "the right way, no shortcuts" was the explicit guidance.

What's in this PR

src/internet_identity_interface/src/internet_identity/types/smtp.rs

Brings forward the SMTP gateway protocol types from PoC PR 3760: SmtpRequest/SmtpResponse/SmtpHeader/SmtpMessage/SmtpAddress/SmtpEnvelope, the size bounds, and the input-bound validation (format_address lowercases both halves; truncate_at_char_boundary clamps to the previous UTF-8 boundary so a multi-byte subject can't trap the canister). Drops postbox-specific bits (PostboxEmail, ValidatedSmtpRequest, anchor-number parser).

src/internet_identity/src/dkim/

  • types.rs — Algorithm (RsaSha256, Ed25519Sha256), HeaderCanon/BodyCanon (Relaxed, Simple), DkimCheck/DkimCheckName/DkimCheckStatus per-step diagnostics, EmailVerificationStatus / VerificationFailReason result shape.
  • parse.rs (RFC 6376 §3.5) — DKIM-Signature header tag-list parser. Splits structurally on ; first then on the first = per element, so a literal b= substring inside another tag's base64 doesn't get misread as a new tag start (the bug class the PoC PR review specifically flagged). Folded whitespace inside base64 values is stripped before decoding. Tag names case-insensitive; duplicates rejected.
  • canonicalize.rs (§3.4.2 / §3.4.4) — relaxed header canon (lowercase name, unfold continuations, collapse WSP+ to single SP, strip trailing WSP, strip WSP around colon) and relaxed body canon (per-line WSP cleanup, drop trailing empty lines, ensure non-empty output ends in exactly one CRLF).
  • dns_record.rs (§3.6.2) — DKIM TXT record parser. Tag names case-insensitive (P= vs p= was a PoC bug), whitespace inside p= tolerated (multi-chunk DNS TXT records), t=y/t=s flags honoured, unknown tags ignored.
  • signature.rs — RSA-SHA256 (RFC 5702 / RFC 8301) and Ed25519-SHA256 (RFC 8463) signature verification on top of rsa+sha2+ed25519-dalek from PR 1's deps. Enforces 1024-bit RSA minimum per design §5.6. Ed25519 path wraps in SHA-256 per RFC 8463. Plus body_hash_sha256 with optional l= truncation per §3.4.5.
  • verify.rs — orchestration. Multi-signature loop per §5.5 (accept on first pass), tag enforcement per design §5.4 (c=relaxed/* only, x= expiration, i= alignment with d=, k= match, t=y testing-mode), bottom-up header selection per §5.4 when h= lists a name multiple times, b=value blanking that's structural-position-aware so it doesn't mis-target an internal substring.
  • test_vectors.rs#[cfg(test)] .eml loader + 8 end-to-end tests against committed fixtures.

test_vectors/dkim/

  • 3 synthetic .eml files generated offline with dkimpy + a 2048-bit RSA key (relaxed/relaxed, relaxed/simple, simple/simple).
  • The matching DKIM TXT record (public key only).
  • README documenting provenance — the throwaway private key is not committed.

Test plan

  • cargo check -p internet_identity --target wasm32-unknown-unknown — clean.
  • cargo test -p internet_identity --bin internet_identity dkim — 75 tests pass (parse 14, canonicalize 18, dns_record 16, signature 7, verify 12, end-to-end 8).
  • cargo test -p internet_identity --bin internet_identity — 313 tests pass total (was 238 before this PR; +75 DKIM, plus a few in smtp types).
  • cargo test -p internet_identity_interface --lib — 52 tests pass (was 42; +10 SMTP type tests).
  • cargo clippy -p internet_identity --bin internet_identity --tests -- -D warnings — clean.
  • cargo fmt --check — clean (modulo pre-existing diffs unrelated to this PR).

Stack

This is PR 2 of a 12-PR series. Includes PR 3838's commits as its base; once PR 3838 merges, the diff shrinks to just the DKIM additions.

Subsequent PRs:

  • PR 3 — DMARC alignment.
  • PR 4 — DoH outcall fallback for unsigned domains (Gmail / Outlook / iCloud — see the design doc §7.6 and the team Slack writeup).
  • PRs 5–9 — storage + Candid + behavior for email recovery.
  • PRs 10–12 — frontend.

PR Stack

# PR Description Status
0 #3836 Design doc Open
1 #3838 DNSSEC verifier scaffold Open
2 #3877 DKIM verifier (RFC 6376) Open
3 #3878 DMARC alignment (RFC 7489) Open
4 #3879 DoH fallback Open
5+6 #3880 Setup flow (storage + smtp_request) Open
7 #3881 Recovery flow (delegation) Open
8 #3882 Frontend + feature flag Open
9 #3883 Deploy/upgrade scripts: dnssec_config + doh_config Open
10 #3884 Email-recovery UX overhaul Open

sea-snake added 9 commits May 12, 2026 13:57
Carries forward the wire-format Candid surface the off-chain SMTP gateway
already targets in the PoC PR, slimmed to just what PR 2's DKIM verifier
and PR 8's smtp_request dispatch will need:

- SmtpRequest / SmtpResponse / SmtpHeader / SmtpMessage / SmtpAddress /
  SmtpEnvelope, plus the size bounds and SMTP error codes.
- Validation helpers: validate_envelope, validate_message, validate_smtp_request.
- format_address (lowercases both halves so envelope casing can't bypass
  per-anchor lookups) and truncate_at_char_boundary (fallback to previous
  UTF-8 boundary, avoiding the multi-byte panic the PoC review flagged).

Drops the postbox-specific bits (PostboxEmail, ValidatedSmtpRequest, the
to.user → anchor parser) — they don't fit the email-recovery design, see
docs/ongoing/email-recovery.md §2 non-goals.

10 unit tests for bounds, address normalisation, and char-boundary
truncation. Wasm32 build unaffected.
First piece of the hand-rolled DKIM verifier. Going hand-rolled (rather
than depending on stalwartlabs/mail-auth) because mail-auth pulls a
non-optional hickory-resolver dep that fails to compile for
wasm32-unknown-unknown — we'd need to fork+patch with perpetual rebase
burden to use it.

This commit lands:

- src/internet_identity/src/dkim/types.rs: Algorithm (RsaSha256,
  Ed25519Sha256), HeaderCanon / BodyCanon (Relaxed, Simple), DkimCheck /
  DkimCheckName / DkimCheckStatus per-step diagnostics, and the public
  EmailVerificationStatus / VerificationFailReason result shape.
- src/internet_identity/src/dkim/parse.rs: structural tag-list parser
  for the DKIM-Signature header value. Splits on ';' first, then on the
  first '=' per element, so a literal 'b=' substring inside another
  tag's base64 doesn't get misread as the start of a new tag — that
  was the concrete bug the PoC PR review flagged. Folded whitespace
  inside base64 values is stripped before decoding. Tag names are
  case-insensitive; duplicate tags are rejected.
- mod.rs scaffolding wired into main.rs. dnssec module remains in place.

14 unit tests cover: minimal required tags, case-insensitive tag names,
folded whitespace, the b=-inside-bh= antipattern, v != 1, duplicate
tags, unsupported algorithms (rsa-sha1), ed25519-sha256, l/t/x tags,
explicit i= override, default canon, empty h=, missing required tags,
malformed base64.

Wasm32 build clean; full II suite still passes.
Implements the relaxed canonicalisation algorithms — the only ones our
DKIM verifier accepts on the header side (the parsed-pair gateway
contract precludes byte-exact 'simple/*' header canonicalization, see
design doc §5.2).

Header (§3.4.2): lowercase the name; unfold continuation lines (CRLF +
WSP -> WSP); collapse runs of WSP within the value to a single SP;
strip trailing WSP from the value; strip WSP around the colon. Single-
pass implementation that handles all five steps in one walk.

Body (§3.4.4): per-line WSP cleanup (collapse runs, strip trailing);
drop empty lines from the end of the body; ensure non-empty output
ends in exactly one CRLF; empty input maps to empty output (no
synthetic trailing CRLF).

18 unit tests cover both algorithms, including the edge cases that
trip up naïve implementations: empty bodies, bodies with only
whitespace, bodies without a trailing CRLF, internal empty lines that
must NOT be stripped, header values that are entirely whitespace, and
folded continuations with both space and tab continuations.
Parses the published DKIM key record into a DkimDnsRecord:
- key_type: rsa (default) or ed25519
- public_key: base64-decoded p= value (empty = revoked)
- testing: t=y flag
- strict_auid: t=s flag

Tag handling per §3.6.2.1 / §3.6.2.2:
- v=DKIM1 if present must be first; absent is acceptable.
- Tag names are case-insensitive (the PoC PR review specifically flagged
  P= versus p= mismatch as a real-world bug).
- Whitespace inside p= is tolerated — DNS TXT records can be split
  across multiple <character-string>s and may carry stray WSP at chunk
  boundaries.
- Duplicate tags are rejected (defence-in-depth even though the RFC
  technically allows shadowing).
- t= flag list is colon-separated; we honour 'y' and 's', ignore others.
- Unknown tags are silently dropped per RFC.

16 unit tests cover defaulting, both key types, both t= flags
individually and combined, malformed inputs (missing p, v not first,
unsupported v/k), whitespace inside p, empty p, and key-type/algorithm
matching.

Wasm32 build remains clean.
Both algorithms RFC 8624 specifies as MUST for DKIM (RFC 8301 + RFC 8463),
no others. Reuses rsa, sha2, and ed25519-dalek that are already in the
workspace from PR 1.

RSA path (RFC 5702 / RFC 6376 §3.3.1):
- DKIM publishes RSA keys in SubjectPublicKeyInfo (DER), so we use
  RustCrypto's DecodePublicKey::from_public_key_der which handles the
  X.509 + PKCS#1 wrapping.
- Enforces RSA_MIN_KEY_BITS = 1024 per design §5.6 (planned lift to
  2048 once telemetry confirms zero impact).
- Sanity-checks signature length == modulus length before invoking
  RSASSA-PKCS1-v1_5 verify, so a structural mismatch surfaces as a
  clear MalformedSignature instead of an opaque crypto error.

Ed25519 path (RFC 8463):
- Public key is 32 raw bytes (no SPKI wrapping), signature is 64 bytes.
- Wraps the input in SHA-256 ourselves before calling ed25519-dalek's
  pure-Ed25519 verify, since RFC 8463 specifies signing the SHA-256 of
  the canonical header hash input rather than the input directly.

Plus body_hash_sha256 (the bh= side): SHA-256 of the canonical body,
optionally truncated to l= bytes per RFC 6376 §3.4.5. Truncation caps
at the body length so an l= larger than the body just uses the whole
thing.

7 unit tests cover algorithm/key-type mismatches, malformed RSA keys
(non-SPKI bytes), wrong-length Ed25519 keys / signatures, and the body
hash with and without l=. Wasm32 build remains clean.
Ties parse + canonicalize + dns_record + signature into the public
`verify(email, dkim_txt, now_secs)` API. Implements:

- Multi-signature loop per RFC 6376 §5.5: try every DKIM-Signature
  header in order, accept on first pass, return Unverified with the
  most recent reason if all fail.
- Tag enforcement per design §5.4:
  - c=simple/* on header side rejected with UnsupportedCanonicalization
  - x= expiration check
  - i= alignment with d= (subdomain-permissive unless DNS t=s set)
  - DNS k= must match signature a=
  - DNS t=y testing-mode -> Unverified(TestingMode)
- Header bottom-up selection per RFC 6376 §5.4 — when h= lists a name
  multiple times we pick distinct occurrences walking from the bottom.
- DKIM-Signature header itself canonicalised with b=value blanked,
  no trailing CRLF, per §3.7. The b= blanker is structural — only
  blanks at top-level tag positions, so a literal 'b=' substring
  inside another tag's base64 is never mistargeted (the bug class
  the PoC PR review flagged).
- Per-signature DkimCheck breakdown so a UI can render which step
  failed.

12 unit tests cover: AUID alignment (exact, subdomain, evil-suffix,
local-only), b= blanking (simple, bh-not-blanked, b-at-start, no-
internal-substring-misblank), no signature, c=simple rejection,
expired signature, misaligned i=. The cryptographic happy-path tests
land alongside the captured/synthetic test vectors in the next commit.

67 DKIM tests pass total (parse 14 + canonicalize 18 + dns_record 16
+ signature 7 + verify 12). Wasm32 build clean.
Lands the synthetic .eml fixtures plus the loader that turns them into
SmtpRequest-shaped data the verifier actually consumes. End-to-end
tests exercise the full pipeline: parse_eml -> SmtpRequest -> verify.

Synthetic vectors were generated offline using dkimpy (Python's
reference DKIM implementation) against a freshly-generated throwaway
2048-bit RSA key. The private key is *not* committed — only the .eml
output and the matching public DKIM TXT record. README in
test_vectors/dkim/ documents provenance and regeneration.

Three .eml files cover:
- c=relaxed/relaxed (the common case for mainstream senders)
- c=relaxed/simple (header relaxed, body simple — accepted)
- c=simple/simple (rejected with UnsupportedCanonicalization)

Eight tests in dkim/test_vectors.rs cover:
- happy-path verification of relaxed/relaxed and relaxed/simple
- simple/simple rejection
- flipped body byte -> BodyHashMismatch
- flipped signature byte -> SignatureInvalid (or related)
- wrong public key -> SignatureInvalid
- no DKIM-Signature header -> NoSignature
- the parse_eml helper itself (sanity check)

The .eml parser unfolds continuation lines per RFC 5322 §2.2.3, drops
the conventional single SP after the colon (matching what the gateway
does in production), and preserves the rest of the header value bytes
verbatim — so relaxed canonicalisation downstream produces the same
form the signer hashed.

Verifier totals: 75 DKIM tests (parse 14, canonicalize 18, dns_record
16, signature 7, verify 12, end-to-end 8). Full II suite: 313 tests
pass (was 238 pre-PR). Wasm32 build clean.
- Use matches!() instead of explicit match arms for KeyType::matches_signature_alg
- rustfmt across all dkim/ files and types/smtp.rs
- Stale test_vectors.rs doc-comment said the throwaway private key 'is committed' — corrected to 'is *not* committed' (the fixtures' provenance README is the source of truth)
- blank_b_tag_value: preserve original bytes around the b= tag instead
  of always emitting literal 'b='. RFC 6376 §3.7 says only the tag
  value (and surrounding WSP) is replaced; the tag name and the bytes
  between the name and '=' must come through verbatim. Concretely,
  signers that emit 'B=' (uppercase) or 'b\t=' / 'b =' (FWS between
  the name and '=') are valid per §3.2 and were previously mis-handled
  — relaxed canonicalisation collapses both forms to 'b:' downstream,
  so the bug only surfaces on signatures from those few senders, but
  by-construction is more honest. New tests cover uppercase B,
  tab-between-name-and-equals, and space-between-name-and-equals.

- test_vectors.rs module-level docstring: corrected to say the private
  key is *not* committed (the README and the actual tree both already
  said this; the docstring was the only contradiction). Dropped the
  reference to scripts/sign-dkim.py since that file isn't checked in.

- smtp.rs validate_smtp_request_envelope_only_ok: tightened the
  assertion from matches!(..., Ok(()) | Err(SmtpResponse::Ok{..}))
  to is_ok(). The Err arm was unreachable (SmtpResponse::Err always
  wraps SmtpRequestError) and would have masked a regression where
  validation mistakenly returned the Ok variant inside Err.
@sea-snake sea-snake force-pushed the feat/dkim-verifier branch from 23985ad to bcf8233 Compare May 12, 2026 11:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds the DKIM (RFC 6376/RFC 8463) verification layer for the email-recovery stack, along with the underlying DNSSEC proof-bundle verifier/config plumbing and shared SMTP gateway request types needed to feed deterministic verification in-canister.

Changes:

  • Introduces a hand-rolled DKIM verifier (src/internet_identity/src/dkim/) including parsing, canonicalization, DNS TXT parsing, signature verification, and end-to-end fixture-based tests.
  • Adds DNSSEC proof-bundle verifier module (src/internet_identity/src/dnssec/), wires DnssecConfig through init/persistent state, and updates the public Candid interface accordingly.
  • Adds/updates test vectors and workspace dependencies to support real-world DNSSEC algorithm coverage and DKIM fixtures.

Reviewed changes

Copilot reviewed 16 out of 17 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
Cargo.toml Updates workspace deps (rsa bump) and adds DNSSEC crypto deps (p256, ed25519-dalek).
Cargo.lock Locks new/updated transitive dependencies for the added crypto crates.
src/internet_identity/Cargo.toml Enables canister crate to use p256/ed25519-dalek for DNSSEC/DKIM verification.
src/internet_identity/internet_identity.did Extends InternetIdentityInit with dnssec_config : opt opt DnssecConfig and adds DNSSEC types.
src/internet_identity/src/main.rs Registers dkim/dnssec modules and plumbs dnssec_config through config()/install-arg application.
src/internet_identity/src/state.rs Persists dnssec_config in PersistentState with upgrade-survival semantics.
src/internet_identity/src/storage/storable/storable_persistent_state.rs Adds dnssec_config field to stable-persisted state conversions/defaults.
src/internet_identity/src/dnssec/mod.rs New DNSSEC verifier module entrypoint + exports.
src/internet_identity/src/dnssec/types.rs New DNSSEC proof-bundle types and verifier error model.
src/internet_identity/src/dnssec/wire.rs Shared DNSKEY/DS wire-format constants.
src/internet_identity/src/dnssec/canonical.rs DNSSEC canonicalization + signed-data / DS digest input assembly.
src/internet_identity/src/dnssec/signature.rs DNSSEC signature verification (RSA/ECDSA/Ed25519) + DS digest matching.
src/internet_identity/src/dnssec/verify.rs DNSSEC bundle verification orchestration (root anchor, chain walk, hops, freshness).
src/internet_identity/src/dnssec/test_vectors.rs Loader for captured DNSSEC JSON vectors + anchor parsing.
src/internet_identity/src/dkim/mod.rs New DKIM verifier module entrypoint + exports.
src/internet_identity/src/dkim/types.rs DKIM result/diagnostic types (EmailVerificationStatus, checks, failure reasons).
src/internet_identity/src/dkim/parse.rs DKIM-Signature tag-list parser (RFC 6376 §3.5).
src/internet_identity/src/dkim/canonicalize.rs Relaxed header/body canonicalization helpers (RFC 6376 §3.4).
src/internet_identity/src/dkim/dns_record.rs DKIM TXT record parser (RFC 6376 §3.6.2).
src/internet_identity/src/dkim/signature.rs DKIM RSA-SHA256 + Ed25519-SHA256 verification + body hash calculation.
src/internet_identity/src/dkim/verify.rs DKIM verification orchestration (multi-signature loop, enforcement, b= blanking).
src/internet_identity/src/dkim/test_vectors.rs .eml loader + end-to-end DKIM verification tests against fixtures.
src/internet_identity_interface/src/internet_identity/types.rs Exposes new dnssec and smtp type modules via re-exports and init struct field.
src/internet_identity_interface/src/internet_identity/types/dnssec.rs New Candid types for DNSSEC config + root anchors.
src/internet_identity_interface/src/internet_identity/types/smtp.rs New SMTP gateway Candid types + bounds-checking helpers.
test_vectors/dnssec/cloudflare-com-2026-05.json Captured DNSSEC chain fixture for TXT verification (algorithm coverage).
test_vectors/dnssec/ed25519-nl-2026-05.json Captured DNSSEC chain fixture exercising Ed25519 (alg 15).
test_vectors/dnssec/iana-root-anchors-2026-05.json IANA root trust anchors fixture used by tests.
test_vectors/dnssec/proton.me-2026-05.json Captured DNSSEC chain fixture for proton.me.
test_vectors/dnssec/protonmail.com-2026-05.json Captured DNSSEC chain fixture for protonmail.com.
test_vectors/dnssec/tutanota.com-2026-05.json Captured DNSSEC chain fixture for tutanota.com.
test_vectors/dkim/README.md Documents DKIM fixture provenance and regeneration approach.
test_vectors/dkim/synth-rsa-relaxed-relaxed.eml Synthetic DKIM-signed email fixture (relaxed/relaxed).
test_vectors/dkim/synth-rsa-relaxed-simple.eml Synthetic DKIM-signed email fixture (relaxed/simple).
test_vectors/dkim/synth-rsa-simple-simple.eml Synthetic DKIM-signed email fixture (simple/simple rejection).
test_vectors/dkim/synth-rsa-test1._domainkey.test.example.com.txt Synthetic DKIM TXT record fixture (public key).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/internet_identity/src/dkim/canonicalize.rs
Three fixes originally landed bundled with DoH review feedback
(commit 5d5f686 on PR #3879). They belong here in PR 2's DKIM
verifier since they're spec-compliance fixes to RFC 6376:

- canonicalize.rs: relaxed_body of an empty body returns CRLF
  (RFC 6376 §3.4.3).
- verify.rs: simple_body of an empty body returns CRLF
  (same rule).
- parse.rs: enforce that the h= tag includes the From header
  (RFC 6376 §5.4). Without this, DMARC alignment can be
  defeated by a post-sign rewrite of the From header.
- types.rs touched by the original combined commit is not
  back-ported — it referenced crate::dmarc::EmailVerificationStatus
  which doesn't exist at this point in the stack.
Copy link
Copy Markdown
Collaborator

@aterga aterga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a one-line comment in Cargo.toml (or dkim/mod.rs) explaining the WASM-safe configuration, or move resolver logic behind a feature flag if it only runs in tests. If the build fails on CI for wasm32-unknown-unknown, this blocks merge.

Copy link
Copy Markdown
Collaborator

@aterga aterga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving, please check Grok's feedback before merging.

Updated PR #3877 Review (post-#3838 merge)

Overall assessment
With #3838 now merged, this PR is in excellent shape. The diff is now cleanly scoped to the DKIM verifier + minimal SMTP type forward-port (no more DNSSEC commits). The force-push today (May 12) already addressed the b= blanking bug I flagged in earlier feedback — perfect RFC §3.7 compliance. CI is green across the board, tests (including the new synthetic .eml + TXT vectors) all pass, and the implementation remains deliberately minimal, WASM-safe, and security-first.

No major security gaps. The critical path (parse → canonicalize → verify) is tight, happens before any state mutation, and enforces the exact subset of RFC 6376 that the email-recovery design requires. This is ready to merge once the two tiny remaining items below are knocked out.

Blockers (must-fix before merge)

None. The previous hard dependency on #3838 is gone. The only remaining action is a quick rebase onto current main (to drop the now-merged DNSSEC commits from the history), but that’s trivial and non-blocking for review.

Easy-to-change improvements (quick wins, <15 min each)

  1. Error surface for callers (types.rs + main.rs)
    EmailVerificationStatus / VerificationFailReason currently lumps “no DKIM signature at all” into a generic failure. The frontend would benefit from a distinct NoSignaturePresent (or equivalent) so it can show a clear “This email provider doesn’t use DKIM” message instead of a scary “Verification failed”.
    Action: Add one variant to VerificationFailReason (or to DkimError if you prefer) and update the match arm in main.rs::handle_email_recovery. Zero security or correctness impact.

  2. Parser strictness on unknown tags (parse.rs)
    DKIM-Signature parsing already rejects duplicates (correct). Confirm that truly unknown tags are ignored per RFC 6376 §3.5 (they must not cause failure). The TXT record parser in dns_record.rs already does this correctly; just make sure the signature parser matches.
    Action: If it currently returns InvalidHeader on an unknown tag, change to ignore (with a debug-level log). Update the existing unit test that exercises an unknown tag to assert the ignore behavior.

  3. One-line security note in verify.rs
    Add a comment next to the trusted TXT record lookup:

    // TXT record is DNSSEC-validated by the caller (see PR #3838)

    This makes the trust boundary explicit for future readers.

  4. Test-vector CI job polish
    The new dkim-test-vectors job is great. Just rename it to dkim-verifier-tests (or similar) and make the scope comment explicit that it also exercises the interface crate types.
    Action: One-line change in .github/workflows/... (already 90 % done).

Strategic recommendations

  • Security posture
    Still rock-solid. Key wins:

    • Only relaxed/* canonicalisation is accepted (avoids the re-emission pitfalls of simple).
    • Strict enforcement of x=, i= alignment, t=y testing mode, and minimum 1024-bit RSA.
    • Multi-signature loop with bottom-up header selection is exactly as specified in RFC 6376 §3.7.
    • Hand-rolled parser + constant-time crypto primitives = minimal attack surface.

    The synthetic test vectors + dkimpy-generated fixtures give strong coverage. Once the full email-recovery stack lands, a quick cargo fuzz pass on the parser + canonicaliser (5–10 minutes of setup) would be cheap extra insurance.

  • Maintainability
    The new dkim/ module is ~1.2 kLOC of security-critical code and is nicely isolated. Long-term, consider extracting it to a reusable ic-dkim crate (or publishing a WASM-safe fork of a maintained library) so other Internet Computer canisters can benefit without copying the code. Not urgent for this PR.

  • Stack hygiene
    The email-recovery series is landing cleanly. Keeping each PR this focused (DKIM-only, minimal interface changes) is exactly the right pattern. Once this merges, the next ones in the stack should be even faster to review.

Summary verdict: Approve after the rebase + the four easy changes above (all <30 min total). This is high-quality, defense-in-depth work — exactly what we want for email recovery. Happy to re-review the final diff or the next PR in the stack. Great job!

Copy link
Copy Markdown
Collaborator

aterga commented May 12, 2026

Took a pass at Grok's four "easy-to-change improvements" — opened #3885 (draft, targets feat/dkim-verifier) so they can land as a follow-up commit on this PR.

Summary of what actually changed vs. what wasn't actionable:

# Grok's suggestion Action
1 Distinct NoSignaturePresent variant Variant already exists as VerificationFailReason::NoSignature (types.rs:84). Expanded the doc-comment to make the frontend UX intent explicit.
2 Confirm parser ignores unknown tags Behaviour already correct (lookup-by-name in split_tag_list). Added an explicit unknown_tags_are_ignored test covering z= + synthetic zz=.
3 SECURITY comment in verify.rs near the trusted TXT lookup Added inline // SECURITY: block at the parse_dkim_txt call site referencing #3838 and #3879.
4 Rename CI job dkim-test-vectorsdkim-verifier-tests No such job exists in .github/workflows/; this PR doesn't add one. No-op.

The main.rs::handle_email_recovery match-arm update from item 1 belongs in PRs 5–7 of the stack — that handler doesn't exist yet.

All 79 DKIM tests pass, clippy clean, wasm32 builds clean.


Generated by Claude Code

Addresses the four "easy-to-change improvements" from Grok's review on
#3877
(#3877 (review)).
Targets `feat/dkim-verifier` so it can be merged in as a follow-up
commit to that PR.

## What changed

1. **`types.rs` — `NoSignature` doc-comment.** The distinct variant Grok
asked for (`NoSignaturePresent` or equivalent) already exists as
`VerificationFailReason::NoSignature` at types.rs:84 and is returned
from `verify()` when no DKIM-Signature headers are present
(verify.rs:54, verify.rs:67). The frontend can already distinguish it
from a generic failure. Expanded the doc-comment to make the UX intent
("this provider doesn't use DKIM") explicit so future readers (and AI
reviewers) don't re-flag this.

2. **`parse.rs` — explicit unknown-tag-ignored test.** RFC 6376 §3.5
requires implementations to ignore unrecognised tags. The parser already
does this (it's lookup-by-name in `split_tag_list` → `get(name)`;
unknown tags pass straight through — the existing `happy_value()`
fixture even contains an unknown `q=dns/txt`). Pinned the behaviour with
a dedicated `unknown_tags_are_ignored` test covering `z=` and a
synthetic `zz=`.

3. **`verify.rs` — SECURITY comment at the trust boundary.** Added an
inline `// SECURITY:` block at the `parse_dkim_txt` call site stating
that `dkim_txt` is trusted, sourced from the DNSSEC verifier (#3838) or
pinned-host DoH outcall (#3879).

## Not actioned

- **CI job rename (`dkim-test-vectors` → `dkim-verifier-tests`).** No
such job exists in `.github/workflows/`; PR #3877 doesn't add one.
Nothing to rename.
- **`main.rs::handle_email_recovery` match arm.** That endpoint hasn't
been added yet (it's PRs 5–7 of the stack). No-op for this PR.

## Tests

- `cargo test -p internet_identity --bin internet_identity dkim::` — 79
pass (was 78, +1 for the new unknown-tag test).
- `cargo clippy -p internet_identity --bin internet_identity --tests --
-D warnings` — clean.
- `cargo check -p internet_identity --target wasm32-unknown-unknown` —
clean.
- `cargo fmt --check` on the three modified files — clean (other
pre-existing fmt drifts in `dnssec/`, `openid/`, `tests/integration/`
are unrelated and predate this branch).


---
_Generated by [Claude
Code](https://claude.ai/code/session_01NZmvbHgzN5NQc7Hqx9ZzpP)_

Co-authored-by: Claude <noreply@anthropic.com>
@aterga aterga enabled auto-merge May 12, 2026 13:38
@aterga aterga added this pull request to the merge queue May 12, 2026
Merged via the queue into main with commit 74ad617 May 12, 2026
53 of 63 checks passed
@aterga aterga deleted the feat/dkim-verifier branch May 12, 2026 15:10
aterga added a commit that referenced this pull request May 12, 2026
Addresses the two actionable "easy-to-change improvements" from Grok's
review on #3878
(#3878 (comment)).
Targets `feat/dmarc-alignment` so it can land as a follow-up commit on
that PR.

(Tried to push directly to `feat/dmarc-alignment` per the maintainer's
note from #3877 / #3885, but my account got a 403 against that branch —
falling back to a stacked draft PR.)

## What changed

1. **`dmarc/mod.rs` — consolidated `# Security model` docblock.** Calls
out the three deliberate deviations from "stock" DMARC, each with a
one-paragraph rationale:
- **No Public Suffix List** — label-anchored suffix check prevents
`evilexample.com`-style spoofs at the cost of closing multi-domain orgs
(`gmail.com` ↔ `googlemail.com`). Safe direction for a recovery surface.
Design doc §6.4.
- **No SPF** — recovery proves mailbox control, not path-of-delivery.
DKIM gives the cryptographic binding; SPF would need a source IP the
gateway payload doesn't carry. Design doc §6.5.
- **Fail-closed everywhere** — every malformed/unknown step collapses to
`Unverified`. No quarantine, no downgrade.

2. **`dmarc/test_vectors.rs` —
`verifies_end_to_end_when_dmarc_record_has_unknown_tags`.** Runs the
full `verify_email` against a DMARC record carrying `rua=`, `ruf=`,
`fo=`, and a synthetic `vendorext=` tag. Pins the ignore-unknown-tags
behaviour through the public entry point, not just the parser unit
(which already has a `unknown_tags_are_ignored` test in
`dmarc/parse.rs`).

## Not actioned

- **Grok item #1 ("parser must ignore unknown tags").** Already
implemented and unit-tested before this PR. `dmarc/parse.rs` is
lookup-by-name, so unknown tags pass through silently. The PR
description even calls this out: "ignores unknown / reporting tags". The
new end-to-end test is the explicit confirmation Grok asked for under
item #3.

The four strategic recommendations (observability metrics, frontend
error surface, SPF wiring, README) are explicitly "nice-to-have" /
"future" and belong in later PRs of the stack.

## Tests

- `cargo test -p internet_identity --bin internet_identity dmarc::` — 49
pass (was 48, +1 e2e unknown-tag test).
- `cargo clippy -p internet_identity --bin internet_identity --tests --
-D warnings` — clean.
- `cargo check -p internet_identity --target wasm32-unknown-unknown` —
clean.
- `cargo fmt --check` on the two touched files — clean (other
pre-existing fmt drifts in `dnssec/`, `openid/`, `tests/integration/`,
`types/attributes.rs` are unrelated and predate this branch).


---
_Generated by [Claude
Code](https://claude.ai/code/session_01NZmvbHgzN5NQc7Hqx9ZzpP)_

Co-authored-by: Claude <noreply@anthropic.com>
sea-snake pushed a commit that referenced this pull request May 13, 2026
Addresses the two actionable "easy-to-change improvements" from Grok's
review on #3878
(#3878 (comment)).
Targets `feat/dmarc-alignment` so it can land as a follow-up commit on
that PR.

(Tried to push directly to `feat/dmarc-alignment` per the maintainer's
note from #3877 / #3885, but my account got a 403 against that branch —
falling back to a stacked draft PR.)

## What changed

1. **`dmarc/mod.rs` — consolidated `# Security model` docblock.** Calls
out the three deliberate deviations from "stock" DMARC, each with a
one-paragraph rationale:
- **No Public Suffix List** — label-anchored suffix check prevents
`evilexample.com`-style spoofs at the cost of closing multi-domain orgs
(`gmail.com` ↔ `googlemail.com`). Safe direction for a recovery surface.
Design doc §6.4.
- **No SPF** — recovery proves mailbox control, not path-of-delivery.
DKIM gives the cryptographic binding; SPF would need a source IP the
gateway payload doesn't carry. Design doc §6.5.
- **Fail-closed everywhere** — every malformed/unknown step collapses to
`Unverified`. No quarantine, no downgrade.

2. **`dmarc/test_vectors.rs` —
`verifies_end_to_end_when_dmarc_record_has_unknown_tags`.** Runs the
full `verify_email` against a DMARC record carrying `rua=`, `ruf=`,
`fo=`, and a synthetic `vendorext=` tag. Pins the ignore-unknown-tags
behaviour through the public entry point, not just the parser unit
(which already has a `unknown_tags_are_ignored` test in
`dmarc/parse.rs`).

## Not actioned

- **Grok item #1 ("parser must ignore unknown tags").** Already
implemented and unit-tested before this PR. `dmarc/parse.rs` is
lookup-by-name, so unknown tags pass through silently. The PR
description even calls this out: "ignores unknown / reporting tags". The
new end-to-end test is the explicit confirmation Grok asked for under
item #3.

The four strategic recommendations (observability metrics, frontend
error surface, SPF wiring, README) are explicitly "nice-to-have" /
"future" and belong in later PRs of the stack.

## Tests

- `cargo test -p internet_identity --bin internet_identity dmarc::` — 49
pass (was 48, +1 e2e unknown-tag test).
- `cargo clippy -p internet_identity --bin internet_identity --tests --
-D warnings` — clean.
- `cargo check -p internet_identity --target wasm32-unknown-unknown` —
clean.
- `cargo fmt --check` on the two touched files — clean (other
pre-existing fmt drifts in `dnssec/`, `openid/`, `tests/integration/`,
`types/attributes.rs` are unrelated and predate this branch).


---
_Generated by [Claude
Code](https://claude.ai/code/session_01NZmvbHgzN5NQc7Hqx9ZzpP)_

Co-authored-by: Claude <noreply@anthropic.com>
aterga added a commit that referenced this pull request May 13, 2026
…of email-recovery stack) (#3878)

## Summary

PR 3 of the email-recovery stack (`docs/ongoing/email-recovery.md` §6).
Stacks on top of #3877 (DKIM verifier). Lands a hand-rolled DMARC
alignment check and reshapes the verifier API: `dkim::verify_dkim`
becomes a DKIM-only primitive, and the new `dmarc::verify_email` is the
public top-level entry point that produces the combined
`EmailVerificationStatus`.

**Note:** This PR targets `main` but includes PRs 1+2's commits as its
base. Review the DMARC-specific changes by looking at commits on top of
`ec371aae3` (PR 2's tip). Once PRs 1+2 merge, this PR's diff shrinks to
just the DMARC additions.

## What's in this PR

### `src/internet_identity/src/dmarc/`

- **`types.rs`** — `DmarcOutcome` (Aligned / Misaligned / NoRecord /
Malformed), `DmarcPolicy` (None / Quarantine / Reject), `AlignmentMode`
(Strict / Relaxed), `DmarcRecord`, plus the combined
`EmailVerificationStatus` that carries both DKIM diagnostics and the
DMARC outcome on success.
- **`parse.rs`** (RFC 7489 §6.3) — DMARC TXT record parser. Enforces
`v=DMARC1` must be first, `p=` must be one of {none, quarantine,
reject}, `pct=` 0..=100, rejects duplicate tags, ignores unknown /
reporting tags. 12 unit tests.
- **`from_header.rs`** (RFC 5322 / RFC 7489 §3.1.1) — single-mailbox
From-header parser. Accepts bare addr-spec, name-addr, and
quoted-display-name forms; rejects zero/multiple From: headers,
address-lists, group syntax. Tolerates comma/colon inside quoted display
names. 16 unit tests.
- **`alignment.rs`** — strict (exact match) + relaxed (exact match OR
label-aligned subdomain in either direction). Stricter than
RFC-compliant relaxed alignment because we deliberately don't consult
the PSL — design doc §6.4 documents the trust + asymmetric-failure-mode
reasoning. The dot anchor on the subdomain check prevents
`evilexample.com` from aliasing `example.com`. 8 unit tests.
- **`verify.rs`** — orchestration. DKIM first; on failure, surface the
DKIM reason verbatim. On DKIM pass, parse From and check DMARC
alignment. Accepted iff Aligned, OR NoRecord with `dkim_domain ==
from_domain`. 8 unit tests.
- **`test_vectors.rs`** — 5 end-to-end tests reusing PR 2's synthetic
.eml fixtures.

### `src/internet_identity/src/dkim/types.rs` (rename + new variants)

- Renamed `EmailVerificationStatus` → `DkimVerifyResult` (DKIM-only).
The combined verdict moved to `dmarc::EmailVerificationStatus` so it can
carry the `DmarcOutcome`.
- Added `MalformedFromHeader(String)`, `DmarcMalformed(String)`,
`DmarcMisaligned` to `VerificationFailReason`.

### `src/internet_identity/src/dkim/mod.rs`

- Re-exports `verify` as `verify_dkim` so downstream callers (the dmarc
layer) don't have to deal with both a `dkim::verify` and `dmarc::verify`
in scope at the same time.

## Test plan

- [x] `cargo check -p internet_identity --target wasm32-unknown-unknown`
— clean.
- [x] `cargo test -p internet_identity --bin internet_identity dmarc` —
49 tests pass (12 parse + 16 from_header + 8 alignment + 8 verify + 5
e2e).
- [x] `cargo test -p internet_identity --bin internet_identity` — 365
tests pass total (was 313 with PR 2; +49 dmarc + 3 small reshape
adjustments).
- [x] `cargo clippy -p internet_identity --bin internet_identity --tests
-- -D warnings` — clean.
- [x] `cargo fmt --check` — clean (modulo pre-existing unrelated diffs).

## PR Stack
| # | PR | Description | Status |
|---|---|---|---|
| 0 | [#3836](#3836) |
Design doc | Open |
| 1 | [#3838](#3838) |
DNSSEC verifier scaffold | Open |
| 2 | [#3877](#3877) |
DKIM verifier (RFC 6376) | Open |
| 3 | [#3878](#3878) |
DMARC alignment (RFC 7489) | Open |
| 4 | [#3879](#3879) |
DoH fallback | Open |
| 5+6 | [#3880](#3880)
| Setup flow (storage + smtp_request) | Open |
| 7 | [#3881](#3881) |
Recovery flow (delegation) | Open |
| 8 | [#3882](#3882) |
Frontend + feature flag | Open |
| 9 | [#3883](#3883) |
Deploy/upgrade scripts: dnssec_config + doh_config | Open |
| 10 | [#3884](#3884) |
Email-recovery UX overhaul | Open |

---------

Co-authored-by: Arshavir Ter-Gabrielyan <arshavir.ter.gabrielyan@dfinity.org>
Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants