relay: add DNS-01 cert acquisition via Cloudflare API#37
Merged
Conversation
Multi-IP RU round-robin DNS makes the existing webroot HTTP-01 challenge unreliable: LE may pick whichever apex A-record IP it likes, and only one of the round-robin RUs has the challenge file. The second RU's cert ends up rsynced from the first, which goes stale after 90 days unless we run cron-based sync — fragile. DNS-01 via Cloudflare API works regardless of where DNS resolves because the challenge is a TXT record, not an HTTP file. Each RU auto-renews independently against its own LE account; no inter-RU coordination needed. Changes: * defaults/main.yml: relay_certbot_method (default 'webroot' for backwards compat) and relay_certbot_dns_propagation_seconds. * defaults/secrets.yml.example: documents relay_cloudflare_api_token vault var with the CF token-creation recipe (Zone:DNS:Edit on apex). * tasks/install.yml: snap-install certbot-dns-cloudflare plugin and connect it via snap interface, gated on method=dns-cloudflare. trust-plugin-with-root must be set explicitly because the plugin needs root to write into /etc/letsencrypt. * tasks/certbot.yml: validates token presence, deploys /etc/letsencrypt/cloudflare.ini (mode 600, no_log), branches the certbot certonly command on method. Existing webroot path retained unchanged for hosts that don't opt in. Idempotency bug fixed in the same commit: The pre-existing 'cert already covers domain' check parsed `certbot certificates ... | grep 'Domains:'`, but snap certbot 3.x renamed that line to 'Identifiers:'. The grep returned empty, the 'cert doesn't exist' branch fired on every run, and certbot tried to re-issue. LE's small dedup window masked it for a few minutes, but a sustained re-run loop would burn rate budget. Updated the grep to accept both labels: grep -E '^[[:space:]]*(Domains|Identifiers):' Both certbot 2.x (apt distro) and 3.x+ (snap) parse correctly now. Tested: * Manual DNS-01 setup on vm_my_ru and vm_my_ru2 from earlier session. * Re-ran the role with --tags relay_install,relay_certbot — first run reported changed=2 due to the broken grep (false positive); after the fix, idempotent re-run reports changed=0 on both hosts. * certbot renew --cert-name zirgate.com --dry-run on both hosts: "Congratulations, all simulated renewals succeeded". Signed-off-by: findias <findias@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Multi-IP RU round-robin DNS made the existing webroot HTTP-01 challenge unreliable — LE picks one apex A-record IP, and only one RU has the challenge file. The second RU was workarounded by rsyncing certs from the first, but that goes stale after 90 days unless we run cron-sync. DNS-01 via Cloudflare API is the standard fix: works regardless of where DNS resolves because the challenge is a TXT record.
After this PR, each RU auto-renews independently against its own LE account; no inter-RU coordination needed.
Changes
defaults/main.yml—relay_certbot_method(defaultwebrootfor backwards compat) +relay_certbot_dns_propagation_seconds.defaults/secrets.yml.example— documentsrelay_cloudflare_api_tokenvault var with the CF token-creation recipe (Zone:DNS:Edit on apex).tasks/install.yml— snap-installscertbot-dns-cloudflareplugin and connects it, gated on method=dns-cloudflare.trust-plugin-with-rootset explicitly (plugin needs root to write into/etc/letsencrypt).tasks/certbot.yml— validates token presence, deploys/etc/letsencrypt/cloudflare.ini(mode 600, no_log), branches the certbot certonly command on method. Existing webroot path retained unchanged for hosts that don't opt in.Idempotency bug fixed in same commit
Pre-existing 'cert already covers domain' check parsed
certbot certificates ... | grep 'Domains:', but snap certbot 3.x renamed that field toIdentifiers:. Grep returned empty → 'cert doesn't exist' branch fired on every run → certbot tried to re-issue. LE dedup masked it for a few minutes, but sustained loops burn rate budget.Fix:
grep -E '^[[:space:]]*(Domains|Identifiers):'Handles both certbot 2.x (apt) and 3.x+ (snap).
Test plan
vm_my_ruandvm_my_ru2(out-of-band, before this PR)--tags relay_install,relay_certbot. First run reportedchanged=2due to the broken grep (false positive).changed=0on both hosts.certbot renew --cert-name zirgate.com --dry-runon both hosts: "Congratulations, all simulated renewals succeeded".relay_certbot_method: dns-cloudflarein inventory or group_vars; default behaviour for any non-AlchemyLink consumer of this role is unchanged.Operator note
To migrate an existing host from webroot to dns-cloudflare:
relay_cloudflare_api_tokentoroles/relay/defaults/secrets.yml.relay_certbot_method: dns-cloudflarefor the host.--tags relay_install,relay_certbot. Plugin installs; cloudflare.ini deploys; existing cert is re-used (idempotency check skips re-issuance).certbot certonly --dns-cloudflare --force-renewal --cert-name <domain> -d <domain>once to switch the renewal config from webroot to dns-cloudflare. Future auto-renewals pick up the new method.