Skip to content

fix(namecheap): follow ReplacedBy chain + parse SSLGetInfoResult Status#40

Merged
albedosehen merged 1 commit into
mainfrom
fix/namecheap-replaced-by-chain
May 9, 2026
Merged

fix(namecheap): follow ReplacedBy chain + parse SSLGetInfoResult Status#40
albedosehen merged 1 commit into
mainfrom
fix/namecheap-replaced-by-chain

Conversation

@albedosehen
Copy link
Copy Markdown
Contributor

Why

Renewal pipeline hung indefinitely on aur0 against oneiric.dev. rota log showed in_progress for 12+ minutes with WARN logs reading status= (blank), even though Namecheap had clearly accepted the reissue and was actively processing it.

Two bugs:

  1. get_info parsed the wrong XML location for Status. Namecheap returns Status as an attribute on <SSLGetInfoResult ...>. rota was looking for <SSLStatus Status="..."> (an element that doesn't exist) and <Status> text (also doesn't exist). Result: unwrap_or_default() returned empty string, blocking is_issued().

  2. rota polled the wrong SSL ID. namecheap.ssl.reissue creates a NEW SSL ID under the same subscription line — the parent flips to Status="replaced" with a ReplacedBy pointer to the child. rota kept polling the parent forever.

Fix

  • NamecheapCa now holds initial_ssl_id (immutable, from config) + active_ssl_id: AtomicU64 (mutable per renewal). submit() extracts <SSLReissueResult ID="..."> and promotes it to active.

  • get_info reads Status from SSLGetInfoResult's attribute (with fallbacks to legacy shapes). Adds replaced_by: Option<u64> to NamecheapCertInfo.

  • await_issuance adds a chain-follow branch on status == "replaced": swap active_ssl_id to ReplacedBy and continue immediately (no sleep).

Without this every Sectigo CSR-hash renewal would hang the full 30-min POLL_DEADLINE before erroring with timed out waiting for namecheap issuance.

Verified

  • cargo fmt --all --check clean
  • cargo clippy --workspace --all-targets -- -D warnings clean
  • cargo test --workspace --locked 108 daemon tests pass

Two bugs in rota's namecheap CA polling, both surfaced when the
in-flight oneiric.dev renewal hung indefinitely on aur0 with
empty status logs:

* `get_info` looked for `<SSLStatus Status="...">` and `<Status>`
  text, neither of which match Namecheap's actual response. The
  Status attribute lives on `<SSLGetInfoResult ...>`. Result:
  `unwrap_or_default()` returned empty string, so the WARN log
  always read `status=` (blank), and `is_issued()` could never
  match "active" / "issued".

* Namecheap's `ssl.reissue` doesn't reissue the same SSL ID — it
  creates a NEW one under the same subscription line and marks the
  parent as `Status="replaced"` with a `ReplacedBy` pointer. rota
  was polling the parent forever, never seeing the new cert at the
  child ID.

Fix:

* `NamecheapCa` now holds two IDs: `initial_ssl_id` (immutable, from
  rota.yaml) and `active_ssl_id` (AtomicU64, mutable per renewal).
  `submit()` extracts `<SSLReissueResult ID="...">` from the reissue
  response and promotes it to active. `get_info` reads the active
  ID. Subsequent renewal cycles still call `ssl.reissue` against the
  initial subscription ID (the operator's purchase line) but poll
  whatever child Namecheap creates each time.

* `get_info` reads Status as an attribute on `SSLGetInfoResult`
  (with fallbacks to the legacy `SSLStatus` attr + `<Status>` text
  for older response shapes). Captures `ReplacedBy` as `Option<u64>`
  on `NamecheapCertInfo`.

* `await_issuance` adds a chain-following branch: when `status ==
  "replaced"` and a `ReplacedBy` is present, swap the active ID and
  `continue` immediately (no 30s sleep) so polling resumes against
  the right cert in the next iteration. Defensive log if `replaced`
  is reported with no `ReplacedBy`.

Without this fix every Sectigo CSR-hash renewal hung for the full
30-min POLL_DEADLINE before erroring with `timed out waiting for
namecheap issuance`. With it, rota detects issuance in real time
once Sectigo finishes validating the DCV CNAME.

Tests: 108 daemon tests still pass; the chain-follow + status-attr
fixes are exercised by the live deploy on aur0 (tests for the
multi-step state machine would require a more involved mock
NamecheapClient than the one-shot fixture tests use today).
@albedosehen albedosehen merged commit 5a31254 into main May 9, 2026
1 check passed
@albedosehen albedosehen deleted the fix/namecheap-replaced-by-chain branch May 9, 2026 19:31
albedosehen added a commit that referenced this pull request May 9, 2026
rota's `get_info` was looking for `<CertificateReturned>` element
text and `<CACertificate>` element text. Neither matches Namecheap's
actual `ssl.getInfo&Returncertificate=true` response, which carries
`CertificateReturned` as an ATTRIBUTE on `<Certificates>` and packs
PEMs in nested `<Certificate>` elements:

  <Certificates CertificateReturned="true" ReturnType="INDIVIDUAL">
    <Certificate><![CDATA[LEAF_PEM]]></Certificate>
    <CaCertificates>
      <Certificate Type="INTERMEDIATE">
        <Certificate><![CDATA[INTERMEDIATE_1_PEM]]></Certificate>
      </Certificate>
      ...
    </CaCertificates>
  </Certificates>

Result: cert_pem and chain_pem both empty, `is_issued()` false,
polling never terminates even when status==active. So PR #40's
chain-follow lands on the right SSL ID but `await_issuance` still
hangs at the extraction step. Found by extracting the cert manually
out of band when `getInfo` returned status=active for oneiric.dev's
in-flight order: rota's parser yielded empty strings even though
the PEMs were sitting right there in the response.

Fix: new `ApiResponse::pem_blocks(label)` method scans the raw
response for `-----BEGIN <label>-----`...`-----END <label>-----`
armor and returns each block in document order. `get_info` calls
`pem_blocks("CERTIFICATE")`; first block is the leaf, rest are the
chain (concatenated with newlines). The CSR present in the same
response is safely skipped because its label is "CERTIFICATE
REQUEST" and `BEGIN CERTIFICATE-----` doesn't substring-match
`BEGIN CERTIFICATE REQUEST-----`.

This is the 6th and (hopefully) final layer in the rota+Namecheap
end-to-end renewal pipeline, after PRs #36 (reverted), #37 (CDATA
unwrap), #38 (DnsCname variant), #39 (lowercase HostName), #40
(ReplacedBy chain + Status XML path). Tests: 3 new in xml::tests
covering the leaf+chain extraction, the CSR-skip rule, and the
empty-input edge case. Total daemon test count 111 (was 108).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant