CNTRLPLANE-2793: Cache MCS TLS certificate generation in ignition provider#7841
Conversation
Certificate generation involves RSA key generation (2048-bit) which is computationally expensive. Previously, a new self-signed certificate was generated on every GetPayload call. This change caches the PEM-encoded certificate and key across calls, regenerating only when the cached certificate is about to expire (within a 1-hour safety margin of its 24-hour validity). The existing mutex (p.lock) already serializes all GetPayload calls, so no additional synchronization is needed for the cache fields. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add unit tests verifying the certificate caching behavior: - New certificate generation when cache is empty - Cache reuse for subsequent calls with valid certificates - Regeneration when cached certificate has expired - Regeneration when cached certificate is within the refresh margin - Multiple consecutive calls return identical cached results Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Address code review findings: - Change setupProvider to accept testing.TB to avoid capturing outer t - Add t.Parallel() to test functions and subtests - Add log line when regenerating MCS TLS certificate - Add boundary test for expiry exactly at refresh margin - Add assertion verifying certificate validity duration (~24h) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
/auto-cc |
|
@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-2793 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-2793 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
WalkthroughAdds TLS certificate caching to LocalIgnitionProvider: new cached PEM/key fields, expiry and refresh margin, plus getOrGenerateMCSCert() for lazy generation and renewal. GetPayload() now uses the cache and writes Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
Hi @hypershift-jira-solve-ci[bot]. Thanks for your PR. I'm waiting for a openshift member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-2793 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
ignition-server/controllers/local_ignitionprovider.go (1)
103-106: Harden the cache-hit guard to require both cert and key payloads.At Line 104, cache reuse only checks
mcsCertPEM+ expiry. If cache state is partially populated, this can return an unusable key and later write an emptytls.key.Proposed change
func (p *LocalIgnitionProvider) getOrGenerateMCSCert() (certPEM []byte, keyPEM []byte, err error) { - if p.mcsCertPEM != nil && time.Now().Add(mcsCertRefreshMargin).Before(p.mcsCertExpiry) { + if len(p.mcsCertPEM) > 0 && + len(p.mcsKeyPEM) > 0 && + !p.mcsCertExpiry.IsZero() && + time.Now().Add(mcsCertRefreshMargin).Before(p.mcsCertExpiry) { return p.mcsCertPEM, p.mcsKeyPEM, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@ignition-server/controllers/local_ignitionprovider.go` around lines 103 - 106, The cache-hit guard in getOrGenerateMCSCert currently only checks p.mcsCertPEM and expiry, which can return a cert without a key; change the conditional to require both p.mcsCertPEM and p.mcsKeyPEM be non-nil/non-empty and that time.Now().Add(mcsCertRefreshMargin).Before(p.mcsCertExpiry) before returning; update any nil/empty checks to use length checks (len(...) > 0) as appropriate so you never return a cert without its key (which leads to writing an empty tls.key) and fall back to generating a fresh pair when either payload is missing or expired.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@ignition-server/controllers/local_ignitionprovider.go`:
- Around line 103-106: The cache-hit guard in getOrGenerateMCSCert currently
only checks p.mcsCertPEM and expiry, which can return a cert without a key;
change the conditional to require both p.mcsCertPEM and p.mcsKeyPEM be
non-nil/non-empty and that
time.Now().Add(mcsCertRefreshMargin).Before(p.mcsCertExpiry) before returning;
update any nil/empty checks to use length checks (len(...) > 0) as appropriate
so you never return a cert without its key (which leads to writing an empty
tls.key) and fall back to generating a fresh pair when either payload is missing
or expired.
ℹ️ Review info
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (2)
ignition-server/controllers/local_ignitionprovider.goignition-server/controllers/local_ignitionprovider_test.go
|
@coderabbitai Good catch on the cache-hit guard — applied the AI-assisted response via Claude Code |
|
hypershift-jira-solve-ci - This still needs fixed because no code was pushed - #7841 (comment) |
|
/test address-review-comments |
Add nil check for mcsKeyPEM in the cache-hit guard to prevent returning a cached certificate without its corresponding key if state is partially populated. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@bryan-cox Applied the cache-hit guard fix: added AI-assisted response via Claude Code |
|
@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-2793 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ignition-server/controllers/local_ignitionprovider.go`:
- Around line 650-654: The tls private key is being written with overly
permissive mode 0644; update the write that uses
os.WriteFile(filepath.Join(mcsBaseDir, "tls.key"), keyPEM, 0644) so the file is
created with owner-only permissions (e.g., 0600) instead of 0644; locate the
os.WriteFile call that writes "tls.key" in the same block as the "tls.crt" write
and change the file mode constant to 0600 to restrict private key access to the
owner.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3c7dc417-bcc6-47aa-a179-df44c6320931
📒 Files selected for processing (1)
ignition-server/controllers/local_ignitionprovider.go
The tls.key file was being written with 0644 permissions, making the private key readable by group and others. Changed to 0600 for owner-only access. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@hypershift-jira-solve-ci[bot]: This pull request references CNTRLPLANE-2793 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/ok-to-test |
|
/test e2e-aws |
|
/test e2e-aws |
Test Resultse2e-aws
e2e-aks
Failed TestsTotal failed tests: 4
|
|
Tests from second stage were triggered manually. Pipeline can be controlled only manually, until HEAD changes. Use command to trigger second stage. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bryan-cox, enxebre, hypershift-jira-solve-ci[bot] The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/verified by e2e |
|
@bryan-cox: This PR has been marked as verified by DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/pipeline required |
|
Scheduling tests matching the |
|
/test e2e-aks |
|
/override ci/prow/e2e-aks Failure is a known flake not related to this PR |
|
@bryan-cox: Overrode contexts on behalf of bryan-cox: ci/prow/e2e-aks DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
6328e20
into
openshift:main
What this PR does / why we need it:
Caches the MCS TLS certificate and key across
GetPayloadcalls in the ignition server'sLocalIgnitionProvider, avoiding redundant RSA 2048-bit key generation on every payload request.Previously, a new self-signed certificate was generated on every
GetPayloadcall, which is computationally expensive due to RSA key generation. This change introduces agetOrGenerateMCSCert()method that:p.lockmutex that already serializes allGetPayloadcalls, so no additional synchronization is neededWhich issue(s) this PR fixes:
Fixes https://issues.redhat.com/browse/CNTRLPLANE-2793
Special notes for your reviewer:
The cached certificate files (
tls.crtandtls.key) are still written to disk on everyGetPayloadcall (since the MCS process reads them from disk), but the expensive key generation only happens when the cache is empty or the certificate is about to expire.Checklist:
Always review AI generated responses prior to use.
Generated with Claude Code via
/jira:solve [CNTRLPLANE-2793](https://issues.redhat.com/browse/CNTRLPLANE-2793)Summary by CodeRabbit
New Features
Tests