Skip to content

doctor: validate fronting domain TLS certificate#549

Open
dolonet wants to merge 1 commit into
masterfrom
doctor-tls-cert-check
Open

doctor: validate fronting domain TLS certificate#549
dolonet wants to merge 1 commit into
masterfrom
doctor-tls-cert-check

Conversation

@dolonet
Copy link
Copy Markdown
Collaborator

@dolonet dolonet commented May 29, 2026

Fixes #518. Thanks @bam80 for the report.

Problem

The doctor's "Validate fronting domain connectivity" step only opened a bare TCP connection:

Validate fronting domain connectivity
  ✅ web:8443 is reachable

A missing, expired, untrusted, or wrong-host certificate still produced a green check — exactly the misleading result @bam80 described.

Change

After the existing TCP dial, checkFrontingDomain now performs a default crypto/tls client handshake against the fronting endpoint with ServerName set to the secret host. Standard verification (InsecureSkipVerify=false) validates everything in one shot:

  • chain against the system roots,
  • leaf SAN against the secret host,
  • validity period.

An expired / untrusted / wrong-host certificate surfaces as a descriptive x509: error (certificate has expired, signed by unknown authority, certificate is valid for …, not …).

The dial target still honors the domain-fronting.host override while the SNI stays the secret host — matching what domain fronting actually puts on the wire (in the contrib/sni-router setup the dial host is an internal name like web, but the cert is issued for the secret domain).

New output:

Validate fronting domain connectivity
  ✅ web:8443 is reachable
  ✅ TLS certificate for example.com is valid

No new config, no new dependencies, internal/cli/doctor.go only (plus tests).

v1 scope / deferred PROXY-protocol path

When proxy-protocol is enabled, the fronting listener expects a PROXY v2 header before the TLS ClientHello (Caddy's proxy_protocol listener wrapper in the recommended sni-router setup). A bare handshake would hang or be rejected and report a false negative — re-creating the very misleading-result class this issue is about. Rather than hand-write and ship an untested client PROXY header, v1 skips the certificate probe in that mode with a clear note:

  ⏭ TLS certificate check skipped: proxy-protocol is enabled (the listener expects a PROXY header that mtg doctor does not send yet)

Adding a real PROXY-v2-aware probe is a sensible follow-up, but it is near-untestable without the live sni-router/Caddy stack, so I deliberately split it out to keep this PR verifiable.

Reviewer decision — hard ❌ vs soft ⚠️ on expiry

An expired cert fails the handshake hard (red ❌), not as a soft ⚠️ warning. I think hard-fail is correct: it matches the issue's intent ("misleading if the cert is invalid"), and Caddy/certbot auto-renew well in advance, so a live expired cert is already a real outage. If you'd rather expiry be a separate warning tier (distinct from chain/SAN failures), that needs explicit out-of-band expiry parsing — say the word and I'll split it out.

Tests

The probe is factored into a testable probeFrontingTLS(ctx, dialer, dialAddress, sniHost, rootCAs) so the cases run against in-process self-signed TLS listeners — no real network:

  • valid cert/host → success
  • wrong host (SAN mismatch) → failure
  • untrusted CA → failure
  • expired cert → failure
  • override case: dial an IP:port that differs from the SNI, verify against the secret host name → success

go build ./... and go test ./internal/cli/... -race are green.

The fronting-domain step only opened a bare TCP connection, so a missing,
expired, untrusted or wrong-host certificate still reported a green check.
That is exactly the misleading result reported in #518.

After the TCP dial, perform a default crypto/tls handshake against the
fronting endpoint with ServerName set to the secret host. Standard
verification validates the chain against the system roots, checks the leaf
SAN against the secret host, and enforces the validity period in one step,
so expired/untrusted/wrong-host certificates surface as descriptive x509
errors.

The dial target still honors the domain-fronting.host override while SNI
stays the secret host, matching what domain fronting puts on the wire.

When proxy-protocol is enabled the listener expects a PROXY header before
the ClientHello, which doctor does not emit yet; the certificate probe is
skipped with an informational note instead of reporting a false negative.
@dolonet dolonet force-pushed the doctor-tls-cert-check branch from f8d9194 to d2722ef Compare May 29, 2026 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TLS cert validity is not checked on doctor run

1 participant