Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
95 changes: 95 additions & 0 deletions docs/prd/prd-dynamic-tls-kube-rbac-proxy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# PRD: Dynamic TLS Configuration for kube-rbac-proxy

**Bug**: OCPBUGS-78774
**Date**: 2026-04-23

## Problem Statement

The `kube-rbac-proxy` container in the `insights-runtime-extractor` DaemonSet (`manifests/10-insights-runtime-extractor.yaml`) uses default values for the TLS Cipher suites and the minimum version.
OpenShift clusters allow administrators to configure a cluster-wide TLS security profile via `apiservers.config.openshift.io/cluster`. The kube-rbac-proxy must honor this central configuration by specifying its `--tls-cipher-suites` and `--tls-min-version` arguments instead of relying on default values.

## Current State

1. **Static manifest**: The DaemonSet is defined in `manifests/10-insights-runtime-extractor.yaml` and deployed as part of the release payload. The operator does not manage it programmatically.

2. **Default TLS seetings**: `--tls-cipher-suites` & `--tls-min-version` are not set and do not honor OpenShift central configuration.

3. **Existing TLS helpers**: `pkg/insights/insightsclient/apiserver_config.go` already fetches the TLS profile from `apiservers.config.openshift.io/cluster` and converts it to a `crypto/tls.Config` using `library-go/pkg/crypto`. This is used for the operator's own outbound HTTP connections.

4. **RBAC**: The operator's ClusterRole already has `get`, `list`, `watch` on `apiservers` in the `config.openshift.io` API group.

## Implementation

Follows the pattern established by [cluster-dns-operator PR #466](https://github.com/openshift/cluster-dns-operator/pull/466/).

### 1. Exported TLS helpers (`pkg/insights/insightsclient/apiserver_config.go`)

Two existing helpers were exported for reuse:

- `GetTLSSecurityProfile(configClient)` — fetches `apiservers.config.openshift.io/cluster` and reads `Spec.TLSSecurityProfile`, falling back to `TLSProfileIntermediateType` if unset.
- `GetTLSProfileSpec(profile)` — resolves a profile type (Old/Intermediate/Modern/Custom) to a concrete `TLSProfileSpec` using `configv1.TLSProfiles`.

These are also used by the existing `GetTLSConfigFromAPIServer` for the operator's own outbound HTTP connections.

### 2. kube-rbac-proxy TLS arg builder (`pkg/controller/runtime_extractor.go`)

`buildKubeRBACProxyArgs(profile)` converts a `TLSSecurityProfile` into kube-rbac-proxy CLI arguments:

- Resolves the profile to a `TLSProfileSpec` via `insightsclient.GetTLSProfileSpec`.
- For TLS 1.3 (Modern profile): only emits `--tls-min-version` since TLS 1.3 cipher suites are fixed by the protocol and not configurable by kube-rbac-proxy.
- For other profiles: converts OpenSSL cipher names to IANA names using `crypto.OpenSSLToIANACipherSuites()`. If all cipher names are unrecognized, falls back to the Intermediate profile.
- Returns args: `--tls-cipher-suites=<comma-separated IANA names>` and `--tls-min-version=<version>`.

### 3. DaemonSet reconciliation (`pkg/controller/runtime_extractor.go`)

The operator manages the `insights-runtime-extractor` DaemonSet lifecycle instead of relying solely on the static manifest:

- **Embeds the base manifest** (`manifests/10-insights-runtime-extractor.yaml`) in the operator binary using `//go:embed` (copied to `pkg/controller/manifests/`).
- `reconcileRuntimeExtractorDaemonSet` performs create-or-update reconciliation:
1. Parses the embedded DaemonSet manifest.
2. Fetches the current TLS profile via `insightsclient.GetTLSSecurityProfile`.
3. Builds kube-rbac-proxy args via `buildKubeRBACProxyArgs`.
4. Patches the kube-rbac-proxy container args (strips existing `--tls-cipher-suites` / `--tls-min-version` and appends new ones).
5. Creates the DaemonSet if not found, or updates it if the spec has changed (skips update if unchanged).

### 4. Watch for TLS configuration changes (`pkg/controller/operator.go`)

- On startup, performs an initial DaemonSet reconciliation.
- Registers a `tlsReconcileHandler` on the `configInformers.Config().V1().APIServers()` informer, which triggers reconciliation on add/update events for `apiservers.config.openshift.io/cluster`.

## Cipher Suite Mapping

The mapping between OpenShift TLS profiles and kube-rbac-proxy arguments uses `github.com/openshift/library-go/pkg/crypto`:

| Step | Function | Purpose |
|------|----------|---------|
| 1 | `configv1.TLSProfiles[profileType]` | Resolve profile type to `TLSProfileSpec` (ciphers + min version) |
| 2 | `crypto.OpenSSLToIANACipherSuites(spec.Ciphers)` | Convert OpenSSL names (e.g., `ECDHE-RSA-AES128-GCM-SHA256`) to IANA names (e.g., `TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256`) |
| 3 | Format as `--tls-cipher-suites=<joined>` | Pass to kube-rbac-proxy CLI |
| 4 | `spec.MinTLSVersion` (e.g., `VersionTLS12`) | Pass as `--tls-min-version=VersionTLS12` |

## Acceptance Criteria

- [ ] kube-rbac-proxy parameters (`--tls-cipher-suites` and `--tls-min-version`) are based on `apiservers.config.openshift.io/cluster` TLS security profile.
- [ ] There is a mapping between the cipher suites returned by OpenShift and the arguments passed to kube-rbac-proxy using `github.com/openshift/library-go/pkg/crypto`.
- [ ] When the TLS security profile is not configured, the Intermediate profile is used as the default.
- [ ] When the TLS security profile changes on the cluster, the DaemonSet is reconciled with the updated args.
- [ ] Unit tests cover the TLS arg builder, including all profile types (Old, Intermediate, Modern, Custom) and edge cases (nil profile, unrecognized ciphers).

## Key Files

| File | Role |
|------|------|
| `manifests/10-insights-runtime-extractor.yaml` | Base DaemonSet manifest (source of truth, copied into `pkg/controller/manifests/` for embedding) |
| `pkg/controller/manifests/10-insights-runtime-extractor.yaml` | Embedded copy of the base manifest used at runtime |
| `pkg/controller/runtime_extractor.go` | DaemonSet reconciliation, TLS arg builder, manifest parsing, informer event handler |
| `pkg/controller/runtime_extractor_test.go` | Unit tests for arg builder, patching, reconciliation |
| `pkg/insights/insightsclient/apiserver_config.go` | Exported `GetTLSSecurityProfile` and `GetTLSProfileSpec` helpers |
| `pkg/controller/operator.go` | Initial reconciliation call and APIServer informer watch registration |
| `manifests/03-clusterrole.yaml` | RBAC (already has apiserver read permissions) |

## Reference

- [cluster-dns-operator PR #466](https://github.com/openshift/cluster-dns-operator/pull/466/) - Reference implementation of the same pattern.
- `configv1.TLSProfiles` - Profile type to spec mapping from `github.com/openshift/api/config/v1`.
- `crypto.OpenSSLToIANACipherSuites` - Cipher name conversion from `github.com/openshift/library-go/pkg/crypto`.
1 change: 1 addition & 0 deletions manifests/10-insights-runtime-extractor.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,7 @@ spec:
- '--config-file=/etc/kube-rbac-proxy/config.yaml'
- '--tls-cert-file=/etc/tls/private/tls.crt'
- '--tls-private-key-file=/etc/tls/private/tls.key'
- '--tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256'
Comment thread
jmesnil marked this conversation as resolved.
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /etc/tls/private
Expand Down
133 changes: 133 additions & 0 deletions pkg/controller/manifests/10-insights-runtime-extractor.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: insights-runtime-extractor
namespace: openshift-insights
annotations:
include.release.openshift.io/hypershift: "true"
include.release.openshift.io/ibm-cloud-managed: "true"
include.release.openshift.io/self-managed-high-availability: "true"
include.release.openshift.io/single-node-developer: "true"
capability.openshift.io/name: Insights
labels:
app.kubernetes.io/name: insights-runtime-extractor
spec:
selector:
matchLabels:
app.kubernetes.io/name: insights-runtime-extractor
updateStrategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 33%
type: RollingUpdate
template:
metadata:
labels:
app.kubernetes.io/name: insights-runtime-extractor
annotations:
openshift.io/required-scc: insights-runtime-extractor-scc
target.workload.openshift.io/management: '{"effect": "PreferredDuringScheduling"}'
kubectl.kubernetes.io/default-container: extractor
spec:
serviceAccountName: insights-runtime-extractor-sa
hostPID: true
# Deploy the insights-runtime-extractor only on Linux worker nodes
nodeSelector:
kubernetes.io/os: linux
priorityClassName: openshift-user-critical
containers:

- name: kube-rbac-proxy
image: quay.io/openshift/origin-kube-rbac-proxy:latest
Comment on lines +40 to +41
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check how images are populated/substituted in the existing runtime-extractor manifest and operator image refs.
fd -t f '10-insights-runtime-extractor.yaml'
rg -nP --type=yaml '(image:|IMAGE|origin-kube-rbac-proxy|origin-insights-runtime)' -C2
rg -nP 'image-references|release.openshift.io/image-references' -C2

Repository: openshift/insights-operator

Length of output: 50384


🏁 Script executed:

# Compare the two manifest files to see if they're identical
echo "=== Checking if manifests are identical ==="
diff -u manifests/10-insights-runtime-extractor.yaml pkg/controller/manifests/10-insights-runtime-extractor.yaml | head -50

echo -e "\n=== File sizes ==="
wc -l manifests/10-insights-runtime-extractor.yaml pkg/controller/manifests/10-insights-runtime-extractor.yaml

Repository: openshift/insights-operator

Length of output: 1589


🏁 Script executed:

# Search for image substitution/patching logic in the operator
echo "=== Searching for image substitution mechanisms ==="
rg -i "image.*substitut|release.*image|image-references" --type go -A 3 -B 1 | head -100

echo -e "\n=== Searching for kube-rbac-proxy handling ==="
rg "kube-rbac-proxy|image.*patch|manifest.*patch" --type go -A 2 | head -80

Repository: openshift/insights-operator

Length of output: 18307


🏁 Script executed:

# Check how the manifest is used in the operator code
echo "=== Finding where the manifest file is loaded/processed ==="
rg "10-insights-runtime-extractor" --type go -B 2 -A 5 | head -100

Repository: openshift/insights-operator

Length of output: 610


🏁 Script executed:

# Check what the embed directive actually uses and if there's any runtime image patching
echo "=== Looking at runtime_extractor.go for manifest loading and image handling ==="
sed -n '1,100p' pkg/controller/runtime_extractor.go

echo -e "\n=== Check if there's any image tag replacement/patching logic ==="
rg "origin-kube-rbac-proxy|origin-insights-runtime|image.*latest" pkg/controller/ --type go -B 3 -A 3

Repository: openshift/insights-operator

Length of output: 3418


🏁 Script executed:

# Check git history to see if pkg/controller version is new
echo "=== Check creation/modification of pkg/controller/manifests file ==="
git log --oneline --follow -- pkg/controller/manifests/10-insights-runtime-extractor.yaml | head -10

echo -e "\n=== Check if both files are referenced differently ==="
rg "pkg/controller/manifests|manifests.*10-insights-runtime" --type go

Repository: openshift/insights-operator

Length of output: 339


🏁 Script executed:

# Look at the actual runtime_extractor test to understand how the manifest is used
echo "=== Runtime extractor test context ==="
head -150 pkg/controller/runtime_extractor_test.go | tail -100

Repository: openshift/insights-operator

Length of output: 2991


Remove :latest image tags from the embedded release manifest.

Embedding a manifest that references quay.io/openshift/origin-kube-rbac-proxy:latest, origin-insights-runtime-exporter:latest, and origin-insights-runtime-extractor:latest (lines 41, 73, 90) ties the operator binary to mutable image tags. The buildDesiredDaemonSet() function in pkg/controller/runtime_extractor.go only patches TLS arguments at runtime; it does not perform image tag substitution. For a release-payload component, pull by digest or configure the release image substitution mechanism to replace these references, ensuring deployed nodes don't float to an unpinned image version.

Given the commit message states "WIP: Hard-code kube-rbac-proxy for testing", resolve these before merging out of WIP.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/manifests/10-insights-runtime-extractor.yaml` around lines 40
- 41, The embedded manifest references mutable :latest tags for images
(kube-rbac-proxy, origin-insights-runtime-exporter,
origin-insights-runtime-extractor) which ties releases to floating images;
update the manifest entries in
pkg/controller/manifests/10-insights-runtime-extractor.yaml to use immutable
image references (digest-pinned images) or remove hard-coded images so the
release image substitution mechanism can supply the correct images at deploy
time; ensure this aligns with the runtime code path in buildDesiredDaemonSet()
in pkg/controller/runtime_extractor.go which only patches TLS args and does not
substitute image tags.

# tls-cipher-suites & tls-min-version arguments are set at runtime by the operator based on the
# centralized TLS security profile from apiservers.config.openshift.io/cluster
args:
- '--secure-listen-address=:8443'
- '--upstream=http://127.0.0.1:8000'
- '--config-file=/etc/kube-rbac-proxy/config.yaml'
- '--tls-cert-file=/etc/tls/private/tls.crt'
- '--tls-private-key-file=/etc/tls/private/tls.key'
terminationMessagePolicy: FallbackToLogsOnError
volumeMounts:
- mountPath: /etc/tls/private
name: insights-runtime-extractor-tls
- mountPath: /etc/kube-rbac-proxy
name: kube-rbac-proxy-cm
securityContext:
privileged: false
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
runAsNonRoot: true
ports:
- name: https
containerPort: 8443
protocol: TCP
resources:
requests:
cpu: 10m
memory: 100Mi
- name: exporter
image: quay.io/openshift/origin-insights-runtime-exporter:latest
imagePullPolicy: Always
volumeMounts:
- mountPath: /data
name: data-volume
securityContext:
privileged: false
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
resources:
requests:
cpu: 10m
memory: 200Mi
- name: extractor
image: quay.io/openshift/origin-insights-runtime-extractor:latest
imagePullPolicy: Always
env:
- name: CONTAINER_RUNTIME_ENDPOINT
value: unix:///crio.sock
livenessProbe:
exec:
command:
- crictl
- --timeout
- 10s
- info
periodSeconds: 30
timeoutSeconds: 10
resources:
requests:
cpu: 10m
memory: 200Mi
securityContext:
privileged: true
readOnlyRootFilesystem: true
capabilities:
drop:
- ALL
add:
- CAP_SYS_ADMIN
volumeMounts:
- mountPath: /crio.sock
name: crio-socket
- mountPath: /data
name: data-volume
volumes:
- name: crio-socket
hostPath:
path: /run/crio/crio.sock
type: Socket
- name: data-volume
emptyDir: {}
- name: kube-rbac-proxy-cm
configMap:
name: kube-rbac-proxy
- name: insights-runtime-extractor-tls
secret:
secretName: insights-runtime-extractor-tls
Comment on lines +1 to +133
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

Avoid duplicating the DaemonSet manifest between CVO payload and operator binary.

The same manifest now lives at both manifests/10-insights-runtime-extractor.yaml (CVO-managed) and pkg/controller/manifests/10-insights-runtime-extractor.yaml (operator-embedded). As noted in the PR discussion, having both CVO and the operator reconcile the DaemonSet risks a reconciliation war. Please confirm the end-state plan — either move sole ownership to the operator (drop CVO annotations / remove the CVO payload copy), or have only the operator patch specific fields while CVO owns the rest — and keep a single source of truth to avoid drift between the two copies over time.

🧰 Tools
🪛 Checkov (3.2.524)

[medium] 1-133: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[high] 1-133: Container should not be privileged

(CKV_K8S_16)


[medium] 1-133: Minimize the admission of root containers

(CKV_K8S_23)


[medium] 1-133: Containers should not share the host process ID namespace

(CKV_K8S_17)

🪛 Trivy (0.69.3)

[error] 14-131: Access to host PID

DaemonSet 'insights-runtime-extractor' should not set 'spec.template.spec.hostPID' to true

Rule: KSV-0010

Learn more

(IaC/Kubernetes)


[error] 87-118: Privileged

Container 'extractor' of DaemonSet 'insights-runtime-extractor' should set 'securityContext.privileged' to false

Rule: KSV-0017

Learn more

(IaC/Kubernetes)


[error] 31-131: Default security context configured

daemonset insights-runtime-extractor in openshift-insights namespace is using the default security context, which allows root privileges

Rule: KSV-0118

Learn more

(IaC/Kubernetes)


[error] 14-133: Access to host PID

DaemonSet 'insights-runtime-extractor' should not set 'spec.template.spec.hostPID' to true

Rule: KSV-0010

Learn more

(IaC/Kubernetes)


[error] 89-120: Privileged

Container 'extractor' of DaemonSet 'insights-runtime-extractor' should set 'securityContext.privileged' to false

Rule: KSV-0017

Learn more

(IaC/Kubernetes)


[error] 31-133: Default security context configured

daemonset insights-runtime-extractor in openshift-insights namespace is using the default security context, which allows root privileges

Rule: KSV-0118

Learn more

(IaC/Kubernetes)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/manifests/10-insights-runtime-extractor.yaml` around lines 1 -
133, The DaemonSet "insights-runtime-extractor" is duplicated between the CVO
payload copy and the operator-embedded manifest, causing potential
reconciliation conflicts; decide a single owner and remove the duplicate: either
keep CVO as owner (delete the operator-embedded manifest entry for the DaemonSet
and ensure the operator no longer creates/reconciles it) or transfer full
ownership to the operator (remove CVO payload copy and drop CVO-only annotations
such as include.release.openshift.io/* from the manifest the operator will
manage), and if you choose a split-responsibility model, implement the operator
to only patch explicit fields (documented) while leaving the CVO-managed
manifest unchanged so there is one clear source of truth for the
"insights-runtime-extractor" DaemonSet.

8 changes: 8 additions & 0 deletions pkg/controller/operator.go
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,14 @@ func (s *Operator) Run(ctx context.Context, controller *controllercmd.Controller
promRulesController := insights.NewPrometheusRulesController(configAggregator, controller.KubeConfig)
go promRulesController.Start(ctx)

// reconcile the insights-runtime-extractor DaemonSet with TLS configuration
// from apiservers.config.openshift.io/cluster
if err := reconcileRuntimeExtractorDaemonSet(ctx, kubeClient, configClient); err != nil {
klog.Errorf("Failed initial runtime extractor DaemonSet reconciliation: %v", err)
}
apiserverInformer := configInformers.Config().V1().APIServers().Informer()
apiserverInformer.AddEventHandler(newTLSReconcileHandler(ctx, kubeClient, configClient)) //nolint:errcheck
Comment on lines +323 to +329
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm that SharedInformerFactory.Start does not start informers registered after the initial Start().
# Also confirm AddEventHandler signature returning (registration, error).
rg -nP -C5 '\bfunc \(f \*sharedInformerFactory\) Start\(' --type=go
rg -nP -C3 'AddEventHandler\(handler ResourceEventHandler\)' --type=go

Repository: openshift/insights-operator

Length of output: 5722


🏁 Script executed:

# Check the actual code context in operator.go around the mentioned lines
sed -n '130,340p' pkg/controller/operator.go | cat -n

Repository: openshift/insights-operator

Length of output: 10583


🏁 Script executed:

# Look at the full Start() implementation
sed -n '143,165p' vendor/k8s.io/client-go/informers/factory.go | cat -n

Repository: openshift/insights-operator

Length of output: 763


🏁 Script executed:

# Check the factory initialization and informer registration before line 136
sed -n '100,150p' pkg/controller/operator.go | cat -n

Repository: openshift/insights-operator

Length of output: 2246


Critical: APIServer informer is never started, so TLS config changes won't trigger reconciliation.

At line 136, configInformers.Start(ctx.Done()) is invoked. However, the APIServers informer is only instantiated here (line 199) via .Informer() — after the factory has already been started. The SharedInformerFactory.Start() implementation iterates only over informers that exist in the factory at the time of the call; any informers added after Start() returns are not automatically started. Consequently, the handler registered at line 200 will never be invoked when TLS config changes occur, and only the initial synchronous reconcile at line 196 will run.

Additionally, AddEventHandler returns (ResourceEventHandlerRegistration, error), but the error is being silenced via //nolint:errcheck. Registration errors should be handled.

To fix: call configInformers.Start(ctx.Done()) again after registering the handler to ensure the lazily-instantiated APIServers informer is actually started, and handle the AddEventHandler error.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/operator.go` around lines 323 - 329, The APIServers informer
created via apiserverInformer :=
configInformers.Config().V1().APIServers().Informer() is instantiated after the
factory was started so it won't be running; change the flow in the operator
reconcile/init code so that after you create apiserverInformer and call
apiserverInformer.AddEventHandler(newTLSReconcileHandler(ctx, kubeClient,
configClient)) you check and handle the returned error from AddEventHandler, and
then call configInformers.Start(ctx.Done()) (or
configInformers.Start/WaitForCacheSync as appropriate) again to ensure the
newly-created APIServers informer is started; reference
reconcileRuntimeExtractorDaemonSet, apiserverInformer, newTLSReconcileHandler
and configInformers.Start to locate where to add the AddEventHandler error
handling and the second Start call.


// support logLevelController
logLevelController := loglevel.NewClusterOperatorLoggingController(opClient, controller.EventRecorder)
operatorConfigInformers.Start(ctx.Done())
Expand Down
Loading