doctor: validate fronting domain TLS certificate#549
Open
dolonet wants to merge 1 commit into
Open
Conversation
The fronting-domain step only opened a bare TCP connection, so a missing, expired, untrusted or wrong-host certificate still reported a green check. That is exactly the misleading result reported in #518. After the TCP dial, perform a default crypto/tls handshake against the fronting endpoint with ServerName set to the secret host. Standard verification validates the chain against the system roots, checks the leaf SAN against the secret host, and enforces the validity period in one step, so expired/untrusted/wrong-host certificates surface as descriptive x509 errors. The dial target still honors the domain-fronting.host override while SNI stays the secret host, matching what domain fronting puts on the wire. When proxy-protocol is enabled the listener expects a PROXY header before the ClientHello, which doctor does not emit yet; the certificate probe is skipped with an informational note instead of reporting a false negative.
f8d9194 to
d2722ef
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes #518. Thanks @bam80 for the report.
Problem
The doctor's "Validate fronting domain connectivity" step only opened a bare TCP connection:
A missing, expired, untrusted, or wrong-host certificate still produced a green check — exactly the misleading result @bam80 described.
Change
After the existing TCP dial,
checkFrontingDomainnow performs a defaultcrypto/tlsclient handshake against the fronting endpoint withServerNameset to the secret host. Standard verification (InsecureSkipVerify=false) validates everything in one shot:An expired / untrusted / wrong-host certificate surfaces as a descriptive
x509:error (certificate has expired,signed by unknown authority,certificate is valid for …, not …).The dial target still honors the
domain-fronting.hostoverride while the SNI stays the secret host — matching what domain fronting actually puts on the wire (in thecontrib/sni-routersetup the dial host is an internal name likeweb, but the cert is issued for the secret domain).New output:
No new config, no new dependencies,
internal/cli/doctor.goonly (plus tests).v1 scope / deferred PROXY-protocol path
When
proxy-protocolis enabled, the fronting listener expects a PROXY v2 header before the TLS ClientHello (Caddy'sproxy_protocollistener wrapper in the recommended sni-router setup). A bare handshake would hang or be rejected and report a false negative — re-creating the very misleading-result class this issue is about. Rather than hand-write and ship an untested client PROXY header, v1 skips the certificate probe in that mode with a clear note:Adding a real PROXY-v2-aware probe is a sensible follow-up, but it is near-untestable without the live sni-router/Caddy stack, so I deliberately split it out to keep this PR verifiable.
Reviewer decision — hard ❌ vs soft⚠️ on expiry
An expired cert fails the handshake hard (red ❌), not as a soft⚠️ warning. I think hard-fail is correct: it matches the issue's intent ("misleading if the cert is invalid"), and Caddy/certbot auto-renew well in advance, so a live expired cert is already a real outage. If you'd rather expiry be a separate warning tier (distinct from chain/SAN failures), that needs explicit out-of-band expiry parsing — say the word and I'll split it out.
Tests
The probe is factored into a testable
probeFrontingTLS(ctx, dialer, dialAddress, sniHost, rootCAs)so the cases run against in-process self-signed TLS listeners — no real network:go build ./...andgo test ./internal/cli/... -raceare green.