Add enhancement spec for centralized TLS configuration and enforcement#1910
Conversation
|
Skipping CI for Draft Pull Request. |
|
|
||
| 1. **Fetch TLS configuration** from the appropriate central source: | ||
| - **API Server configuration** (`apiserver.config.openshift.io/cluster`) - For most components that should match the API server TLS profile | ||
| - **Kubelet configuration** - For components running on nodes that need to match node-level TLS settings |
There was a problem hiding this comment.
Can we get more details on how one should pick this flavor + some concrete examples?
Case in point: monitoring deploys the node-exporter to all (Linux) nodes via a DaemonSet to collect node metrics. Right now we rely on the API server configuration settings to configure TLS, should we rather switch to Kubelet settings?
Corollary question: what's the use case for different API and kubelet TLS settings?
There was a problem hiding this comment.
@simonpasquier did you ever manage to figure this out? In your case, I think it makes sense to follow apiservers since monitoring isn't directly aligned with the kubelet component itself.
|
|
||
| **Rejected because**: This adds significant complexity and goes against the goal of centralized configuration. The three existing configuration points (API Server, Kubelet, Ingress) already provide sufficient granularity. | ||
|
|
||
| ## Open Questions |
There was a problem hiding this comment.
From what I can tell, the proposal focuses on TLS server configuration. I assume that TLS client configuration is out of scope. Asking the question because Prometheus scrapes metrics from operators/operands via HTTPS. Is it ok to consider that the scraped endpoints apply the centralized TLS configuration and we don't need to enforce anything at the client (Prometheus) level?
We also a few components which users can configure to talk to external systems:
- Prometheus can send metrics via remote-write to HTTPS servers.
- Alertmanager can send notifications to HTTPS based API services (as well as emails via STARTTLS).
Again is it safe to assume that the users are responsible to properly configure these interfaces?
There was a problem hiding this comment.
Right now, the focus is on servers, yes. If we find clients that we ship are using old or constrained TLS implementations that don't offer to negotiate using TLS 1.3 + ML-KEM, then we will likely ask for those clients to be updated because the end result of that client/server combo will not be TLS 1.3 + ML-KEM.
e639b43 to
826d177
Compare
|
Updated this with a revised draft with up to date information from internal discussions. |
| - type: Applied | ||
| status: "True" | ||
| reason: AllComponentsUpdated | ||
| message: "All components have applied the TLS configuration" |
There was a problem hiding this comment.
who's going to update this condition?
|
Reworked this EP. Removed the proposal of a new CRD for centralizing TLS configuration. The new plan is to use the existing apiserver config for all components. This reduces the complexity of adding a new CRD. |
|
Open question for discussion: What should components do if they are unable to comply with the tls settings? Is there a pattern for bubbling up configuration errors? Should the use the degraded flag? |
|
|
||
| The new `tlsAdherence` field controls how strictly the TLS configuration is enforced by components: | ||
|
|
||
| **`legacy` (default):** Provides backward-compatible behavior. Components will attempt to honor the configured TLS profile but may fall back to their individual defaults if conflicts arise. This mode is intended for clusters that need to maintain compatibility with existing configurations during migration. |
There was a problem hiding this comment.
From the user's perspective, if I set this, where do I go next to find out which components are falling back to legacy behavior?
There was a problem hiding this comment.
To make sure I'm recalling correctly, this mode being configured means "do what you've always done" and is meant to mitigate breaking changes from automatically migrating clusters from "some components respect this API for TLS configuration" to "all components respect this API for TLS configuration".
Should the guidance here instead be something along the lines of:
- IF
tlsAdherence: Legacy- IF
${component}already respects TLS profile configuration, continue to do so. - ELSE do what
${component}has always done, which is presumably that they have not respected the TLS profile and therefore should not
- IF
| **Components With Override Capability:** | ||
| - **Ingress Controller:** Retains its existing capability to define a specific `tlsSecurityProfile`. This allows Ingress to support legacy external clients (e.g., using Intermediate or Old profiles) even if the cluster internal communication is set to Modern. | ||
| - **Routes:** Individual routes may specify TLS settings that differ from the cluster-wide default for specific application requirements. | ||
| - **Layered Products:** Layered products are expected to inherit the cluster default. Products unable to support the default (e.g., due to version incompatibility) must document their deviation and provide a justification. |
There was a problem hiding this comment.
How should a component let the user know it's not going to use the cluster default set in the API server configuration?
| tlsAdherence: strict | ||
| ``` | ||
|
|
||
| #### TLS 1.3 with Custom Ciphers (Not Recommended) |
There was a problem hiding this comment.
Just to make sure my understanding is correct - the best a cluster administrator could do if we land this enhancement without #1894 (assuming we implement consistent enforcement) would be to configure the cluster to use the Modern profile and hope clients prefer the hybrid curves (assuming the cluster admin is prioritizing defense against harvest now, decrypt later attacks).
If we land this enhancement along with #1894 - then they at least have the option force clients to use PQC-safe curves with:
apiVersion: config.openshift.io/v1
kind: APIServer
metadata:
name: cluster
spec:
tlsSecurityProfile:
type: Custom
custom:
# ciphers:
# - TLS_AES_128_GCM_SHA256 # This will be IGNORED - TLS 1.3 ciphers are hardcoded
minTLSVersion: VersionTLS13
curves:
- X25519MLKEM512
tlsAdherence: strictBut that requires 1894 to turn off the classical curves.
|
reposting my comment from #1910 (comment) I'm assuming that the proposal applies to the server part of managed components and we don't need to enforce any client TLS settings. |
|
99926c2 to
8ff168c
Compare
|
@richardsonnick can this be written down in the proposal? Maybe in the non-goals section? |
|
/approve /hold
@richardsonnick feel free to unhold whenever you are ready |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: joelanford The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
rhmdnd
left a comment
There was a problem hiding this comment.
/lgtm
I have a question about the Microshift part, and overall agree with Joe's comments.
|
|
||
| **Single-node OpenShift (SNO):** The enhancement applies fully. Resource consumption impact is minimal as the APIServer configuration is an existing lightweight resource watched by existing operators. | ||
|
|
||
| **MicroShift:** **TBD** |
There was a problem hiding this comment.
Did we close the loop on this out-of-band from the review?
| # TLS 1.3 cipher suites are hardcoded by the Go runtime. | ||
| ``` | ||
|
|
||
| **Note on Existing Validation:** The APIServer already validates that cipher suites cannot be configured with TLS 1.3 via an [admission plugin in openshift-kube-apiserver](https://github.com/openshift/kubernetes/blob/9d521311f5fb67dc43f49eeb728ee2c80976835a/openshift-kube-apiserver/admission/customresourcevalidation/apiserver/validate_apiserver.go#L214-L219). This enhancement will add CEL validation expressions to match this existing behavior. The CEL validation will use ratcheting to ensure that existing resources with this configuration are not immediately invalidated upon upgrade. |
There was a problem hiding this comment.
Just a note that this validation would currently only take place on the apiservers resource and would not enforce this on something like the ingress override type.
Adding a CEL based validation will propagate to all existing resources that rely on the existing type. For components that should not be subject to this new validation, they should use a bifurcated type for overrides.
everettraven
left a comment
There was a problem hiding this comment.
Aside from the associated openshift/api PR needing an update to match updates here, this LGTM.
Approved from an API review perspective.
…the same tlsAdherence semantics
|
/unhold |
|
/lgtm |
|
@richardsonnick: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Centralized and Enforced TLS Configuration Enhancement
This enhancement proposes a mechanism to ensure all OpenShift components honor the centralized TLS security profile configuration, addressing a gap in TLS policy enforcement across the platform.
Summary
Currently, many OpenShift components hardcode TLS settings or rely on library defaults rather than respecting cluster-wide TLS configuration. This creates security vulnerabilities and blocks Post-Quantum Cryptography (PQC) readiness.
This proposal introduces a
tlsAdherencefield that allows gradual adoption of strict TLS enforcement, providing a safe upgrade path for customers.Key Features
tlsAdherencefield withLegacy(default) andStrictmodesTLSAdherenceStrictfeature gate during transitionWhy This Matters
Related Work