CCXDEV-15790: extractor controller by opokornyy · Pull Request #1280 · openshift/insights-operator

opokornyy · 2026-04-29T13:53:03Z

Implement controller that manages runtime-extractor resources.

Sample Archive

None

Documentation

pkg/controller/runtimeextractor/README.md

Unit Tests

pkg/controller/runtimeextractor/informer_test.go
pkg/controller/runtimeextractor/resources/daemonset_test.go
pkg/controller/runtimeextractor/resources/retry_test.go

Privacy

Yes. There are no sensitive data in the newly collected information.

Changelog

None

Breaking Changes

No

References

https://redhat.atlassian.net/browse/CCXDEV-15790

Summary by CodeRabbit

New Features
- Added a runtime-extractor controller with informer-driven reconciliation and resource management for the runtime-extractor deployment.
- Enhanced operator status behavior to detect version changes and trigger reconciliations.
Configuration
- Expanded RBAC permissions for daemonset and event management.
- Deployment now includes new environment variables to pin related component images.
Documentation
- Added detailed runtime-extractor controller documentation and usage guidance.
Tests
- Added unit tests for informer, resource management, retry-on-conflict, and daemonset behavior.

openshift-ci-robot · 2026-04-29T13:53:07Z

@opokornyy: This pull request references CCXDEV-15790 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Implement controller that manages runtime-extractor resources.

Categories

Bugfix

Data Enhancement

Feature

Backporting

Others (CI, Infrastructure, Documentation)

Sample Archive

None

Documentation

pkg/controller/runtimeextractor/README.md

Unit Tests

pkg/controller/runtimeextractor/informer_test.go

pkg/controller/runtimeextractor/resources/daemonset_test.go

pkg/controller/runtimeextractor/resources/retry_test.go

Privacy

Yes. There are no sensitive data in the newly collected information.

Changelog

None

Breaking Changes

No

References

https://redhat.atlassian.net/browse/CCXDEV-15790

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

coderabbitai · 2026-04-29T13:54:39Z

📝 Walkthrough

Walkthrough

Adds a runtime-extractor controller, informer, and ResourceManager; embeds and manages a DaemonSet with env-pinned images and retry-on-conflict; wires operator startup and status to signal updates; updates RBAC and Deployment envs/indentation; and adds tests and documentation.

Changes

Runtime Extractor feature

Layer / File(s)	Summary
RBAC & Deployment edits `manifests/03-clusterrole.yaml`, `manifests/06-deployment.yaml`	Adds ClusterRole rule for `apps/daemonsets`, expands `openshift-insights` Role `events` verbs, fixes Deployment YAML indentation, and adds `RELATED_IMAGE_INSIGHTS_RUNTIME_EXTRACTOR`, `RELATED_IMAGE_INSIGHTS_RUNTIME_EXPORTER`, `RELATED_IMAGE_KUBE_RBAC_PROXY` env vars.
Operator initialization & wiring `pkg/controller/operator.go`	Moves config-observer startup earlier, creates buffered `updateCh`, starts namespace-scoped informer factory and runtime-extractor controller, and passes `updateCh` into status controller.
Status controller changes & tests `pkg/controller/status/controller.go`, `pkg/controller/status/controller_test.go`	Adds `updateCh` to status controller, replaces prior semver helper with `checkVersionChanges`, signals `updateCh` on any version change, and refactors tests to assert both version-change flags.
RuntimeExtractor controller core & package doc `pkg/controller/runtimeextractor/controller.go`, `pkg/controller/runtimeextractor/doc.go`	Adds ConfigNotifier/ResourceManager/ResourceInformer interfaces and runtimeExtractorController with a Run loop reconciling on config changes, cluster-version updates, and informer-detected resource drift.
Resource informer & tests `pkg/controller/runtimeextractor/informer.go`, `pkg/controller/runtimeextractor/informer_test.go`	Adds informer targeting `insights-runtime-extractor` DaemonSet in `openshift-insights`, emitting non-blocking notifications on generation changes and deletions; adds tests for update, deletion, and namespace isolation.
DaemonSet ResourceManager, manifest & tests `pkg/controller/runtimeextractor/resources/.go`, `pkg/controller/runtimeextractor/resources/manifests/runtime-extractor-daemonset.yaml`, `pkg/controller/runtimeextractor/resources/_test.go`	Embeds DaemonSet YAML, substitutes container images from env vars (with defaults), applies via server-side apply with retry-on-conflict, provides delete/get/exists helpers, exposes ResourceManager facade, and adds comprehensive unit tests including retry-on-conflict verification.
Documentation & README `pkg/controller/runtimeextractor/README.md`, `pkg/controller/runtimeextractor/doc.go`	Adds README and package doc describing controller architecture, reconciliation triggers, drift handling, server-side apply behavior, usage examples, and tests.

Sequence Diagram(s)

sequenceDiagram
    participant Op as Operator.Run()
    participant RC as RuntimeExtractorController
    participant CN as ConfigNotifier
    participant RM as ResourceManager
    participant RI as ResourceInformer
    participant K8s as Kubernetes API

    Op->>RC: NewRuntimeExtractorController(configNotifier, updateCh, ...)
    Op->>RC: Run(ctx)

    activate RC
    rect rgba(100, 150, 200, 0.5)
    Note over RC: Initial Config Check
    RC->>CN: Config()
    CN-->>RC: InsightsConfiguration
    alt DisableRuntimeExtractor = false
        RC->>RM: ApplyRuntimeExtractorResources(ctx)
        RM->>K8s: Apply DaemonSet
        K8s-->>RM: Success
    end
    end

    rect rgba(150, 100, 200, 0.5)
    Note over RC: Reconciliation Triggers
    par Config Change Event
        CN->>CN: ConfigChanged() <- signal
        CN-->>RC: Config change notification
        RC->>CN: Config()
        CN-->>RC: Updated InsightsConfiguration
        alt DisableRuntimeExtractor = true
            RC->>RM: DeleteRuntimeExtractorResources(ctx)
        else
            RC->>RM: ApplyRuntimeExtractorResources(ctx)
        end
    and Cluster Version Update
        Op->>RC: Send on updateCh
        RC->>CN: Config()
        CN-->>RC: InsightsConfiguration
        alt DisableRuntimeExtractor = false
            RC->>RM: ApplyRuntimeExtractorResources(ctx)
        end
    and External Resource Drift
        K8s->>RI: DaemonSet updated/deleted
        RI->>RC: ResourceModified()
        RC->>CN: Config()
        CN-->>RC: InsightsConfiguration
        alt DisableRuntimeExtractor = false
            RC->>RM: ApplyRuntimeExtractorResources(ctx)
        else
            RC->>RM: DeleteRuntimeExtractorResources(ctx)
        end
    end
    end

    deactivate RC

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~55 minutes

🚥 Pre-merge checks | ✅ 10 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 35.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Test Structure And Quality	⚠️ Warning	WaitForCacheSync return values unchecked at lines 68, 115, 159. Assertion messages missing throughout daemonset_test.go, inconsistent with codebase.	Check WaitForCacheSync return and fail if false; add meaningful messages to assertions matching controller_test.go pattern.

✅ Passed checks (10 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'CCXDEV-15790: extractor controller' directly references the Jira ticket and clearly identifies the main change: implementing an extractor controller.
Description check	✅ Passed	The description follows the repository template structure with all major sections completed, including feature category selection, documentation, unit tests, privacy considerations, and references.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names	✅ Passed	This PR uses standard Go testing (testing.T), not Ginkgo. The custom check is specific to Ginkgo test names and is not applicable to this codebase. Standard Go test names are stable and deterministic.
Microshift Test Compatibility	✅ Passed	No Ginkgo e2e tests found in PR. All tests are standard Go unit tests using testing.T, which are outside the scope of this MicroShift compatibility check.
Single Node Openshift (Sno) Test Compatibility	✅ Passed	No Ginkgo e2e tests added; only standard Go unit tests (testing package) are included. SNO compatibility check applies only to Ginkgo e2e tests.
Topology-Aware Scheduling Compatibility	✅ Passed	Runtime-extractor DaemonSet uses only linux nodeSelector with no topology-breaking constraints. DaemonSet scales appropriately for all OpenShift topologies (HA, SNO, TNF, TNA, HyperShift).
Ote Binary Stdout Contract	✅ Passed	No OTE stdout violations found. All logging uses klog configured to stderr via logs.InitLogs() and alsologtostderr=true. No fmt.Print/Println/os.Stdout writes in non-test code.
Ipv6 And Disconnected Network Test Compatibility	✅ Passed	No Ginkgo e2e tests were added in this PR. All test files are standard Go unit tests (using testing.T) with fake Kubernetes clients - no network calls or IPv4 assumptions are made.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-04-29T13:55:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: opokornyy

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [opokornyy]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 7

🧹 Nitpick comments (1)

pkg/controller/runtimeextractor/informer_test.go (1)

68-68: Assert informer cache sync success to reduce flakiness.

WaitForCacheSync return values are ignored; if sync fails, tests may fail nondeterministically later. Assert all returned values are true before proceeding.

✅ Minimal hardening pattern

-	informerFactory.WaitForCacheSync(ctx.Done())
+	for informerType, synced := range informerFactory.WaitForCacheSync(ctx.Done()) {
+		if !synced {
+			t.Fatalf("Informer cache failed to sync: %v", informerType)
+		}
+	}

Also applies to: 115-115, 159-159

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@pkg/controller/runtimeextractor/informer_test.go` at line 68, The call to
informerFactory.WaitForCacheSync currently ignores its boolean return values,
causing flaky tests; change the test to capture the returned map (e.g., synced
:= informerFactory.WaitForCacheSync(ctx.Done())) and assert that all values in
that map are true (fail the test if any are false) before proceeding—apply the
same change for other occurrences of WaitForCacheSync in this test (lines
referencing informerFactory.WaitForCacheSync at the other noted locations).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@manifests/03-clusterrole.yaml`:
- Around line 129-140: The ClusterRole currently grants create/delete/update on
apiGroups "apps" resources "daemonsets" which is overbroad; remove the mutating
verbs (create, delete, update) for apps/daemonsets from the ClusterRole and
leave only read verbs (get, list, watch) there, then add those mutating verbs to
the existing namespaced Role that targets the openshift-insights namespace (the
namespaced Role that the runtime-extractor controller uses), and update/confirm
the RoleBinding so the controller's ServiceAccount is bound to that namespaced
Role (ensuring the controller retains get/list/watch at cluster or namespace
scope as intended and only gains create/delete/update within
openshift-insights).

In `@pkg/controller/runtimeextractor/controller.go`:
- Around line 147-179: The createDeployment, deleteDeployment and
updateDeployment functions currently log errors from
resourceManager.ApplyRuntimeExtractorResources / DeleteRuntimeExtractorResources
and swallow them; change each to return an error (e.g., func (re
*runtimeExtractorController) createDeployment(ctx context.Context) error) and
propagate the error to the caller so the reconciler can requeue/retry, or if you
prefer keep void signatures then explicitly trigger a reschedule (e.g.,
requeue/reconcile helper) on failure; also return early on isCreated checks as
before but ensure deleteDeployment returns nil when nothing to delete and
non-nil when DeleteRuntimeExtractorResources fails, and updateDeployment returns
the Apply error so transient API failures bubble up rather than being silently
logged.

In `@pkg/controller/runtimeextractor/informer.go`:
- Around line 103-121: In handleResourceDeletion (type resourceInformer) detect
and unwrap cache.DeletedFinalStateUnknown tombstones before calling
meta.Accessor: if obj is a tombstone, extract the tombstone.Obj (or underlying
runtime.Object) and reassign it to obj so meta.Accessor succeeds; then continue
the existing namespace/name checks and call ri.notifyModification() when
runtimeExtractorName is deleted. Ensure the tombstone handling uses the
cache.DeletedFinalStateUnknown type assertion and preserves existing logging
behavior.

In `@pkg/controller/runtimeextractor/README.md`:
- Around line 54-141: Update the README fenced code blocks that lack language
tags by adding appropriate languages: change the ASCII flow diagrams under the
"Reconciliation Flow", "Initial Deployment", "Configuration Change", and
"Resource Drift" headings to use a language tag such as text or mermaid (e.g.,
replace ``` with ```text or ```mermaid), keep the "Watch patterns" Go block as
```go and ensure the resource labels block remains ```yaml; this will satisfy
markdownlint MD040 without altering the content.

In `@pkg/controller/runtimeextractor/resources/daemonset_test.go`:
- Around line 104-107: The test currently only asserts images changed; replace
the NotEqual checks with exact equality checks against the expected env-derived
image strings so the test fails on incorrect mappings: in
pkg/controller/runtimeextractor/resources/daemonset_test.go update the
assertions for ds.Spec.Template.Spec.Containers[0].Image, [1].Image and
[2].Image to assert Equal against the precise expected image values (either
compute them from the same environment variables used by the code under test or
hard-code the exact expected strings used in the test setup) instead of using
assert.NotEqual.

In `@pkg/controller/runtimeextractor/resources/daemonset.go`:
- Around line 151-162: daemonSetExists currently swallows all get errors and
returns false, which treats API/auth/transport failures as "not found"; change
the signature to return (bool, error) (e.g., daemonSetExists(ctx
context.Context) (bool, error)), call rm.getDaemonSet(ctx) and if
apierrors.IsNotFound(err) return (false, nil), if err != nil return (false,
err), otherwise return (true, nil); update all callers of daemonSetExists to
handle the error return and only treat NotFound as absence while surfacing other
errors so reconciliation isn't skipped on temporary API failures.

In `@pkg/controller/status/controller.go`:
- Around line 205-208: The direct blocking send to c.updateCh in the status
reconciliation path can deadlock merge() if updateCh is nil or full; change it
to a nil-safe, non-blocking signal: check c.updateCh != nil and then use a
non-blocking send (select with a case c.updateCh <- struct{}{} and a default) so
you only attempt to signal when the channel exists and won't block if it's full.

---

Nitpick comments:
In `@pkg/controller/runtimeextractor/informer_test.go`:
- Line 68: The call to informerFactory.WaitForCacheSync currently ignores its
boolean return values, causing flaky tests; change the test to capture the
returned map (e.g., synced := informerFactory.WaitForCacheSync(ctx.Done())) and
assert that all values in that map are true (fail the test if any are false)
before proceeding—apply the same change for other occurrences of
WaitForCacheSync in this test (lines referencing
informerFactory.WaitForCacheSync at the other noted locations).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 5cc38019-b1ed-410e-8cff-a4fd86c65c10

📥 Commits

Reviewing files that changed from the base of the PR and between 50525ba and b2636bb.

📒 Files selected for processing (15)

manifests/03-clusterrole.yaml
manifests/06-deployment.yaml
pkg/controller/operator.go
pkg/controller/runtimeextractor/README.md
pkg/controller/runtimeextractor/controller.go
pkg/controller/runtimeextractor/doc.go
pkg/controller/runtimeextractor/informer.go
pkg/controller/runtimeextractor/informer_test.go
pkg/controller/runtimeextractor/resources/daemonset.go
pkg/controller/runtimeextractor/resources/daemonset_test.go
pkg/controller/runtimeextractor/resources/manager.go
pkg/controller/runtimeextractor/resources/manifests/runtime-extractor-daemonset.yaml
pkg/controller/runtimeextractor/resources/retry_test.go
pkg/controller/status/controller.go
pkg/controller/status/controller_test.go

coderabbitai · 2026-04-29T14:12:15Z

+  - apiGroups:
+      - "apps"
+    resources:
+      - daemonsets
+    verbs:
+      - get
+      - list
+      - watch
+      - create
+      - delete
+      - update
 ---


⚠️ Potential issue | 🟠 Major

Scope DaemonSet write permissions to namespace-level Role (least privilege).

Granting create/delete/update on apps/daemonsets in a ClusterRole allows mutating DaemonSets cluster-wide. The runtime-extractor controller manages a namespaced DaemonSet (openshift-insights), so this should be moved to the existing namespaced Role in openshift-insights.

🔧 Suggested RBAC tightening

- - apiGroups: - - "apps" - resources: - - daemonsets - verbs: - - get - - list - - watch - - create - - delete - - update

# In Role `insights-operator` (namespace: openshift-insights), add: + - apiGroups: + - apps + resources: + - daemonsets + verbs: + - get + - list + - watch + - create + - delete + - update

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@manifests/03-clusterrole.yaml` around lines 129 - 140, The ClusterRole currently grants create/delete/update on apiGroups "apps" resources "daemonsets" which is overbroad; remove the mutating verbs (create, delete, update) for apps/daemonsets from the ClusterRole and leave only read verbs (get, list, watch) there, then add those mutating verbs to the existing namespaced Role that targets the openshift-insights namespace (the namespaced Role that the runtime-extractor controller uses), and update/confirm the RoleBinding so the controller's ServiceAccount is bound to that namespaced Role (ensuring the controller retains get/list/watch at cluster or namespace scope as intended and only gains create/delete/update within openshift-insights).

coderabbitai · 2026-04-29T14:12:15Z

+func (re *runtimeExtractorController) createDeployment(ctx context.Context) {
+	klog.Info("Creating runtime extractor resources")
+
+	if err := re.resourceManager.ApplyRuntimeExtractorResources(ctx); err != nil {
+		klog.Errorf("Failed to apply runtime extractor resources: %v", err)
+	}
+}
+
+func (re *runtimeExtractorController) deleteDeployment(ctx context.Context) {
+	klog.Info("Deleting runtime extractor resources")
+
+	if !re.isCreated(ctx) {
+		klog.Info("Runtime extractor resources do not exist, nothing to delete")
+		return
+	}
+
+	if err := re.resourceManager.DeleteRuntimeExtractorResources(ctx); err != nil {
+		klog.Errorf("Failed to delete runtime extractor resources: %v", err)
+	}
+}
+
+func (re *runtimeExtractorController) updateDeployment(ctx context.Context) {
+	klog.Info("Updating runtime extractor resources")
+
+	// Avoid creating it when the cluster version is updated
+	if !re.isCreated(ctx) {
+		klog.Info("Runtime extractor resources not found, skipping update")
+		return
+	}
+
+	if err := re.resourceManager.ApplyRuntimeExtractorResources(ctx); err != nil {
+		klog.Errorf("Failed to apply runtime extractor resources: %v", err)
+	}


⚠️ Potential issue | 🟠 Major

Surface reconcile failures instead of logging them away.

These helpers swallow apply/delete errors after logging. Because the controller is event-driven, a transient API failure can leave the runtime-extractor resources out of sync indefinitely if no new event arrives.

Please bubble the error into a retry/requeue path, or explicitly reschedule reconciliation.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/controller/runtimeextractor/controller.go` around lines 147 - 179, The createDeployment, deleteDeployment and updateDeployment functions currently log errors from resourceManager.ApplyRuntimeExtractorResources / DeleteRuntimeExtractorResources and swallow them; change each to return an error (e.g., func (re *runtimeExtractorController) createDeployment(ctx context.Context) error) and propagate the error to the caller so the reconciler can requeue/retry, or if you prefer keep void signatures then explicitly trigger a reschedule (e.g., requeue/reconcile helper) on failure; also return early on isCreated checks as before but ensure deleteDeployment returns nil when nothing to delete and non-nil when DeleteRuntimeExtractorResources fails, and updateDeployment returns the Apply error so transient API failures bubble up rather than being silently logged.

coderabbitai · 2026-04-29T14:12:15Z

+// handleResourceDeletion processes resource deletion events
+func (ri *resourceInformer) handleResourceDeletion(obj interface{}, resourceType string) {
+	metadata, err := meta.Accessor(obj)
+	if err != nil {
+		klog.Errorf("Failed to get metadata from deleted runtime extractor %s: %v", resourceType, err)
+		return
+	}
+
+	// Only care about our specific resources
+	if metadata.GetNamespace() != runtimeExtractorNamespace {
+		return
+	}
+
+	name := metadata.GetName()
+	if name == runtimeExtractorName {
+		klog.Infof("Runtime extractor %s %s/%s was deleted externally, triggering reconciliation",
+			resourceType, metadata.GetNamespace(), name)
+		ri.notifyModification()
+	}


⚠️ Potential issue | 🟠 Major

Handle tombstones in delete events.

DeleteFunc can receive cache.DeletedFinalStateUnknown tombstones from client-go. meta.Accessor(obj) will fail on that wrapper, so deletions of the target DaemonSet can be missed instead of triggering reconciliation.

Please unwrap the tombstone before filtering.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/controller/runtimeextractor/informer.go` around lines 103 - 121, In handleResourceDeletion (type resourceInformer) detect and unwrap cache.DeletedFinalStateUnknown tombstones before calling meta.Accessor: if obj is a tombstone, extract the tombstone.Obj (or underlying runtime.Object) and reassign it to obj so meta.Accessor succeeds; then continue the existing namespace/name checks and call ri.notifyModification() when runtimeExtractorName is deleted. Ensure the tombstone handling uses the cache.DeletedFinalStateUnknown type assertion and preserves existing logging behavior.

coderabbitai · 2026-04-29T14:12:15Z

+```go
+// Watch patterns
+DaemonSet:  openshift-insights/insights-runtime-extractor
+```
+
+### Detection Criteria
+
+The informer uses **generation-based filtering** to minimize unnecessary reconciliations:
+
+**DaemonSet**:
+- Checks if `Generation` changed (only increments when spec changes, not status)
+- Filters out ~90% of update events (status updates, controller metadata changes)
+- Deletion events always trigger reconciliation
+
+**Why generation-based?**
+- **Simple**: Single field check instead of deep comparison
+- **Efficient**: Filters out 90% of noise (status updates, reconciliation loops)
+- **Reliable**: Kubernetes guarantees generation increments on spec changes
+- **Performant**: `resourceapply` handles detailed comparison and actual updates
+
+### Reconciliation Flow
+
+```
+External Change → Informer Detects → Notification Sent → Controller Reconciles → State Restored
+```
+
+## Resource Protection
+
+### Server-Side Apply
+The controller uses `resourceapply` functions from `library-go`, which provide:
+- **Three-way merge**: Preserves fields managed by other controllers
+- **Generation tracking**: Detects meaningful changes
+- **Conflict resolution**: Handles concurrent modifications gracefully
+
+### Retry Logic
+All resource apply operations use `retry.RetryOnConflict` with exponential backoff:
+- **Automatic retry**: Conflicts are automatically retried (up to 5 attempts by default)
+- **Exponential backoff**: Delays increase between retries (10ms, 20ms, 40ms, etc.)
+- **Jitter**: Random delay variation to prevent thundering herd
+- **Success guarantee**: Eventually consistent even under concurrent modifications
+
+### Ownership
+Resources are labeled with:
+```yaml
+labels:
+  app.kubernetes.io/managed-by: insights-operator
+  app.kubernetes.io/name: insights-runtime-extractor
+```
+
+## Event Flow
+
+### Initial Deployment
+```
+Operator Starts
+    ↓
+Read Configuration
+    ↓
+DisableRuntimeExtractor == false?
+    ↓ (yes)
+Create/Update Resources
+    ↓
+Monitor for Changes
+```
+
+### Configuration Change
+```
+Config Updated
+    ↓
+ConfigNotifier Sends Event
+    ↓
+Controller Handles Config Change
+    ↓
+Enable: Apply Resources
+Disable: Delete Resources
+```
+
+### Resource Drift
+```
+User/Process Modifies Resource
+    ↓
+Informer Detects Change
+    ↓
+Notification Sent to modifiedCh
+    ↓
+Controller Reapplies Desired State
+    ↓
+Resource Restored
+```


⚠️ Potential issue | 🟡 Minor

Add language tags to the fenced blocks.

The plain fences in this section are triggering markdownlint MD040. Please mark them as text, mermaid, or an equivalent language so the docs pass lint cleanly.

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 76-76: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 106-106: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 119-119: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

[warning] 131-131: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/controller/runtimeextractor/README.md` around lines 54 - 141, Update the README fenced code blocks that lack language tags by adding appropriate languages: change the ASCII flow diagrams under the "Reconciliation Flow", "Initial Deployment", "Configuration Change", and "Resource Drift" headings to use a language tag such as text or mermaid (e.g., replace ``` with ```text or ```mermaid), keep the "Watch patterns" Go block as ```go and ensure the resource labels block remains ```yaml; this will satisfy markdownlint MD040 without altering the content.

coderabbitai · 2026-04-29T14:12:15Z

+	// Verify images were updated from environment variables
+	assert.NotEqual(t, "old-image", ds.Spec.Template.Spec.Containers[0].Image)
+	assert.NotEqual(t, "old-image", ds.Spec.Template.Spec.Containers[1].Image)
+	assert.NotEqual(t, "old-image", ds.Spec.Template.Spec.Containers[2].Image)


⚠️ Potential issue | 🟡 Minor

Assert the expected images, not just a change.

These checks would still pass if the wrong image were loaded. Compare against the exact env-derived values so the test catches mapping regressions.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/controller/runtimeextractor/resources/daemonset_test.go` around lines 104 - 107, The test currently only asserts images changed; replace the NotEqual checks with exact equality checks against the expected env-derived image strings so the test fails on incorrect mappings: in pkg/controller/runtimeextractor/resources/daemonset_test.go update the assertions for ds.Spec.Template.Spec.Containers[0].Image, [1].Image and [2].Image to assert Equal against the precise expected image values (either compute them from the same environment variables used by the code under test or hard-code the exact expected strings used in the test setup) instead of using assert.NotEqual.

coderabbitai · 2026-04-29T14:12:15Z

+// daemonSetExists checks if the runtime extractor DaemonSet exists
+func (rm *ResourceManager) daemonSetExists(ctx context.Context) bool {
+	_, err := rm.getDaemonSet(ctx)
+	if err != nil {
+		if apierrors.IsNotFound(err) {
+			return false
+		}
+		klog.Errorf("Failed to get runtime extractor DaemonSet %s/%s: %v", daemonSetNamespace, daemonSetName, err)
+		return false
+	}
+	return true
+}


⚠️ Potential issue | 🟠 Major

Don't treat API failures as resource absence.

daemonSetExists() returns false for every get error, not just NotFound. That makes the controller skip reconciliation during temporary API/auth/transport failures, which can leave the runtime-extractor state stale.

Please return the error separately and only map NotFound to false.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/controller/runtimeextractor/resources/daemonset.go` around lines 151 - 162, daemonSetExists currently swallows all get errors and returns false, which treats API/auth/transport failures as "not found"; change the signature to return (bool, error) (e.g., daemonSetExists(ctx context.Context) (bool, error)), call rm.getDaemonSet(ctx) and if apierrors.IsNotFound(err) return (false, nil), if err != nil return (false, err), otherwise return (true, nil); update all callers of daemonSetExists to handle the error return and only treat NotFound as absence while surfacing other errors so reconciliation isn't skipped on temporary API failures.

coderabbitai · 2026-04-29T14:12:15Z

+		// Update runtime-extractor images on any version change
+		if versionChanged {
+			c.updateCh <- struct{}{}
+		}


⚠️ Potential issue | 🔴 Critical

Avoid blocking send on updateCh in status reconciliation path.

The direct send can deadlock merge() if updateCh is nil or already full. Use a non-blocking send (and nil-safe guard) to keep status updates resilient.

🔒 Non-blocking, nil-safe signal send

- if versionChanged { - c.updateCh <- struct{}{} - } + if versionChanged && c.updateCh != nil { + select { + case c.updateCh <- struct{}{}: + default: + // signal already pending; avoid blocking status reconciliation + } + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// Update runtime-extractor images on any version change

if versionChanged {

c.updateCh <- struct{}{}

}

// Update runtime-extractor images on any version change

if versionChanged && c.updateCh != nil {

select {

case c.updateCh <- struct{}{}:

default:

// signal already pending; avoid blocking status reconciliation

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@pkg/controller/status/controller.go` around lines 205 - 208, The direct blocking send to c.updateCh in the status reconciliation path can deadlock merge() if updateCh is nil or full; change it to a nil-safe, non-blocking signal: check c.updateCh != nil and then use a non-blocking send (select with a case c.updateCh <- struct{}{} and a default) so you only attempt to signal when the channel exists and won't block if it's full.

opokornyy · 2026-05-04T11:21:06Z

/retest

Move runtime-extractor manifests to dedicated directory and update clusterrole permissions for controller resource management. Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>

Implement controller that manages runtime-extractor resources: - ConfigMap for extractor configuration - DaemonSet for node-level data collection - Service for pod communication - Resource lifecycle management Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>

Add comprehensive tests for runtime-extractor resource management: - ConfigMap creation and updates - DaemonSet lifecycle and configuration - Service resource handling - Common test utilities Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>

opokornyy · 2026-05-18T12:25:09Z

/test tls-scanner

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

pkg/controller/runtimeextractor/informer_test.go (1)
17-17: ⚡ Quick win

Consider upgrading to fake.NewClientset() now that apply configurations are available.

fake.NewSimpleClientset is deprecated. The codebase has apply configurations available in the vendored k8s.io/client-go library, making an upgrade to fake.NewClientset feasible with minimal effort.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controller/runtimeextractor/informer_test.go` at line 17, The test uses
the deprecated fake.NewSimpleClientset; replace its usage by calling
fake.NewClientset() when constructing fakeClient (update the instantiation where
fakeClient := fake.NewSimpleClientset() appears) to leverage apply-configuration
support; ensure any test setup that relied on SimpleClientset behavior still
works with the new fake.NewClientset and adjust imports or apply-config usage if
necessary.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/controller/runtimeextractor/resources/retry_test.go`:
- Around line 19-84: Convert Test_applyDaemonSet_RetryOnConflict into a
table-driven test: create a test cases slice (even if it contains one case) with
fields like name, initialDaemonSet, reactorBehavior (to simulate conflict then
success), and wantErr; iterate over cases using t.Run(case.name, func(t
*testing.T) { ... }); inside each subtest set up the fakeClient, recorder,
updateAttempts counter and PrependReactor as before, construct the
ResourceManager via NewResourceManager and call applyDaemonSet, then assert
results against case.wantErr and verify updateAttempts; keep the existing logic
but move per-case setup into the loop and reference
Test_applyDaemonSet_RetryOnConflict, NewResourceManager and applyDaemonSet to
locate where to change.

---

Nitpick comments:
In `@pkg/controller/runtimeextractor/informer_test.go`:
- Line 17: The test uses the deprecated fake.NewSimpleClientset; replace its
usage by calling fake.NewClientset() when constructing fakeClient (update the
instantiation where fakeClient := fake.NewSimpleClientset() appears) to leverage
apply-configuration support; ensure any test setup that relied on
SimpleClientset behavior still works with the new fake.NewClientset and adjust
imports or apply-config usage if necessary.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 636e3849-9a58-4315-a611-71b7b299921b

📥 Commits

Reviewing files that changed from the base of the PR and between b2636bb and 25ff9a1.

📒 Files selected for processing (15)

manifests/03-clusterrole.yaml
manifests/06-deployment.yaml
pkg/controller/operator.go
pkg/controller/runtimeextractor/README.md
pkg/controller/runtimeextractor/controller.go
pkg/controller/runtimeextractor/doc.go
pkg/controller/runtimeextractor/informer.go
pkg/controller/runtimeextractor/informer_test.go
pkg/controller/runtimeextractor/resources/daemonset.go
pkg/controller/runtimeextractor/resources/daemonset_test.go
pkg/controller/runtimeextractor/resources/manager.go
pkg/controller/runtimeextractor/resources/manifests/runtime-extractor-daemonset.yaml
pkg/controller/runtimeextractor/resources/retry_test.go
pkg/controller/status/controller.go
pkg/controller/status/controller_test.go

✅ Files skipped from review due to trivial changes (1)

pkg/controller/runtimeextractor/doc.go

🚧 Files skipped from review as they are similar to previous changes (9)

pkg/controller/runtimeextractor/resources/manager.go
pkg/controller/status/controller_test.go
pkg/controller/status/controller.go
manifests/06-deployment.yaml
pkg/controller/runtimeextractor/resources/daemonset_test.go
manifests/03-clusterrole.yaml
pkg/controller/runtimeextractor/controller.go
pkg/controller/operator.go
pkg/controller/runtimeextractor/resources/daemonset.go

coderabbitai · 2026-05-18T12:32:37Z

+func Test_applyDaemonSet_RetryOnConflict(t *testing.T) {
+	ctx := context.Background()
+
+	// Create initial DaemonSet
+	initialDS := &appsv1.DaemonSet{
+		ObjectMeta: metav1.ObjectMeta{
+			Name:            "insights-runtime-extractor",
+			Namespace:       "openshift-insights",
+			ResourceVersion: "1",
+		},
+		Spec: appsv1.DaemonSetSpec{
+			Selector: &metav1.LabelSelector{
+				MatchLabels: map[string]string{"app": "test"},
+			},
+			Template: corev1.PodTemplateSpec{
+				ObjectMeta: metav1.ObjectMeta{
+					Labels: map[string]string{"app": "test"},
+				},
+				Spec: corev1.PodSpec{
+					Containers: []corev1.Container{
+						{Name: "extractor", Image: "old-image:v1"},
+						{Name: "exporter", Image: "old-image:v1"},
+						{Name: "kube-rbac-proxy", Image: "old-image:v1"},
+					},
+				},
+			},
+		},
+	}
+
+	fakeClient := fake.NewSimpleClientset(initialDS)
+	recorder := events.NewInMemoryRecorder("test", clock.RealClock{})
+
+	// Track update attempts
+	updateAttempts := 0
+
+	// Add reactor to simulate conflict on first attempt, then succeed
+	fakeClient.PrependReactor("update", "daemonsets", func(action k8stesting.Action) (handled bool, ret runtime.Object, err error) {
+		updateAttempts++
+		if updateAttempts == 1 {
+			// First attempt: return conflict error
+			return true, nil, apierrors.NewConflict(
+				appsv1.Resource("daemonsets"),
+				"insights-runtime-extractor",
+				nil,
+			)
+		}
+		// Second attempt: succeed with default behavior
+		return false, nil, nil
+	})
+
+	rm := NewResourceManager(
+		fakeClient.AppsV1(),
+		recorder,
+	)
+
+	// This should succeed after retry
+	_, err := rm.applyDaemonSet(ctx)
+	if err != nil {
+		t.Fatalf("Expected applyDaemonSet to succeed after retry, got error: %v", err)
+	}
+
+	// Verify it retried (should have 2 attempts: 1 conflict + 1 success)
+	if updateAttempts < 2 {
+		t.Errorf("Expected at least 2 update attempts (conflict + retry), got %d", updateAttempts)
+	}
+}


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Convert this to a table-driven test to match repo test standards.

This test is currently single-case; please wrap it in a table-driven structure (even with one case) for consistency and easier expansion.

Proposed refactor sketch

-func Test_applyDaemonSet_RetryOnConflict(t *testing.T) { - ctx := context.Background() - ... -} +func Test_applyDaemonSet_RetryOnConflict(t *testing.T) { + tests := []struct { + name string + }{ + {name: "retries once on conflict and then succeeds"}, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + ctx := context.Background() + // existing setup/assertions + }) + } +}

As per coding guidelines, Use table-driven tests in test implementations.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/runtimeextractor/resources/retry_test.go` around lines 19 - 84, Convert Test_applyDaemonSet_RetryOnConflict into a table-driven test: create a test cases slice (even if it contains one case) with fields like name, initialDaemonSet, reactorBehavior (to simulate conflict then success), and wantErr; iterate over cases using t.Run(case.name, func(t *testing.T) { ... }); inside each subtest set up the fakeClient, recorder, updateAttempts counter and PrependReactor as before, construct the ResourceManager via NewResourceManager and call applyDaemonSet, then assert results against case.wantErr and verify updateAttempts; keep the existing logic but move per-case setup into the loop and reference Test_applyDaemonSet_RetryOnConflict, NewResourceManager and applyDaemonSet to locate where to change.

opokornyy · 2026-05-18T12:38:36Z

/test tls-scanner

coderabbitai

♻️ Duplicate comments (1)

pkg/controller/runtimeextractor/README.md (1)
76-76: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language tags to fenced code blocks.

The four plain-text flow diagrams (lines 76, 106, 119, 131) are still missing language identifiers, which causes markdownlint MD040 warnings. Add text tags to each fence.
📝 Proposed fix
-```
+```text
 External Change → Informer Detects → Notification Sent → Controller Reconciles → State Restored
Apply the same pattern to the fences at lines 106, 119, and 131.

</details>


Also applies to: 106-106, 119-119, 131-131

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @pkg/controller/runtimeextractor/README.md at line 76, Update the fenced code
blocks that contain the plain-text flow diagrams so they include the language
tag "text" instead of a bare fence; for example change totext for the
fence wrapping the flow "External Change → Informer Detects → Notification Sent
→ Controller Reconciles → State Restored" and apply the same change to the other
three flow-diagram fences in the README (the ones around the other two/three
plain-text diagrams). This fixes the MD040 markdownlint warnings by explicitly
marking those blocks as text.
</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In @pkg/controller/runtimeextractor/README.md:

Line 76: Update the fenced code blocks that contain the plain-text flow
diagrams so they include the language tag "text" instead of a bare fence; for
example change totext for the fence wrapping the flow "External Change →
Informer Detects → Notification Sent → Controller Reconciles → State Restored"
and apply the same change to the other three flow-diagram fences in the README
(the ones around the other two/three plain-text diagrams). This fixes the MD040
markdownlint warnings by explicitly marking those blocks as text.
</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Repository YAML (base), Central YAML (inherited)

**Review profile**: CHILL

**Plan**: Enterprise

**Run ID**: `b3544c28-21f2-41f9-80f2-7f5edc2bf16b`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between 25ff9a17854af50c1dc15cb80a125fa7d5059607 and 8518be5d65cd51f39c83f8734d2e81d1af205579.

</details>

<details>
<summary>📒 Files selected for processing (10)</summary>

* `pkg/controller/operator.go`
* `pkg/controller/runtimeextractor/README.md`
* `pkg/controller/runtimeextractor/controller.go`
* `pkg/controller/runtimeextractor/informer.go`
* `pkg/controller/runtimeextractor/informer_test.go`
* `pkg/controller/runtimeextractor/resources/daemonset.go`
* `pkg/controller/runtimeextractor/resources/daemonset_test.go`
* `pkg/controller/runtimeextractor/resources/manager.go`
* `pkg/controller/runtimeextractor/resources/retry_test.go`
* `pkg/controller/status/controller.go`

</details>

<details>
<summary>🚧 Files skipped from review as they are similar to previous changes (9)</summary>

* pkg/controller/runtimeextractor/resources/retry_test.go
* pkg/controller/runtimeextractor/resources/manager.go
* pkg/controller/runtimeextractor/informer_test.go
* pkg/controller/runtimeextractor/resources/daemonset_test.go
* pkg/controller/runtimeextractor/informer.go
* pkg/controller/runtimeextractor/controller.go
* pkg/controller/operator.go
* pkg/controller/runtimeextractor/resources/daemonset.go
* pkg/controller/status/controller.go

</details>

</details>

opokornyy · 2026-05-19T06:16:44Z

/retest

opokornyy · 2026-05-20T08:17:55Z

/test tls-scanner

openshift-ci · 2026-05-20T09:38:57Z

@opokornyy: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/tls-scanner	`8518be5`	link	false	`/test tls-scanner`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

This commit is implementing the Kubernetes informer-based drift detection and reconciliation for runtime-extractor resources (DaemonSet, Service, ConfigMap) Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>

Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (5)

pkg/controller/status/controller.go (1)

205-208: ⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Use a non-blocking, nil-safe send for updateCh in reconciliation.

Line 207 performs a blocking send on the status merge path; if the channel is full (or unset in a caller), reconciliation can stall.

🔒 Suggested fix

-		if versionChanged {
-			c.updateCh <- struct{}{}
-		}
+		if versionChanged && c.updateCh != nil {
+			select {
+			case c.updateCh <- struct{}{}:
+			default:
+				// signal already pending; avoid blocking reconciliation
+			}
+		}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controller/status/controller.go` around lines 205 - 208, The current
blocking send to c.updateCh when versionChanged can stall reconciliation; change
it to a nil-safe, non-blocking send by checking c.updateCh != nil and using a
select with a default case (e.g., select { case c.updateCh <- struct{}{}:
default: }) so the reconcile path never blocks if the channel is full or unset;
update the code around the versionChanged branch in the status merge/reconcile
function to perform this check and non-blocking send.

pkg/controller/runtimeextractor/informer.go (1)

96-101: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle tombstones in delete events.

Delete handlers may receive cache.DeletedFinalStateUnknown; without unwrapping it first, metadata extraction can fail and external deletions may not trigger reconciliation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controller/runtimeextractor/informer.go` around lines 96 - 101, The
delete handler in resourceInformer.handleResourceDeletion must handle
cache.DeletedFinalStateUnknown tombstones before calling meta.Accessor: detect
if obj is a cache.DeletedFinalStateUnknown, extract the actual object via
tombstone.Obj (or log+return if unwrap fails), then proceed to call
meta.Accessor; update the function to unwrap tombstones and handle unwrap
failures so external deletions correctly trigger reconciliation.

pkg/controller/runtimeextractor/resources/daemonset_test.go (1)

104-107: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Assert exact image values, not just “changed from old-image.”

These checks can still pass if images are mapped incorrectly. Compare each container image against the expected env-derived value.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controller/runtimeextractor/resources/daemonset_test.go` around lines 104
- 107, The test currently uses assert.NotEqual on
ds.Spec.Template.Spec.Containers[0..2].Image which only verifies they changed
from "old-image"; replace these with exact comparisons to the expected
environment-derived image strings (the values your test derives from env vars or
helper like expectedImageX) by asserting equality against
ds.Spec.Template.Spec.Containers[0].Image, [1].Image and [2].Image respectively;
locate the expected values where the test sets/reads the envs and use those
variables (or a helper function used in the test) to assert assert.Equal(t,
expectedImage0, ds.Spec.Template.Spec.Containers[0].Image) etc.

pkg/controller/runtimeextractor/controller.go (1)

151-183: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Propagate reconcile errors instead of only logging them.

createDeployment, deleteDeployment, and updateDeployment log failures but always continue. That can leave runtime-extractor resources out of sync indefinitely if no new event arrives.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controller/runtimeextractor/controller.go` around lines 151 - 183, The
createDeployment, deleteDeployment, and updateDeployment functions currently
only log failures and swallow errors; change each function (createDeployment,
deleteDeployment, updateDeployment) to return error instead of void, propagate
and return the error from resourceManager.ApplyRuntimeExtractorResources /
DeleteRuntimeExtractorResources (and any isCreated checks as needed) rather than
only klog.Errorf, and update the caller(s) (the reconcile loop that invokes
these methods) to handle returned errors (retry/requeue or surface them to
controller-runtime) so reconcile failures are not silently ignored.

pkg/controller/runtimeextractor/resources/daemonset.go (1)

151-160: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not treat all Get failures as “does not exist.”

daemonSetExists returns false for non-NotFound errors too, which can hide API/auth/transport failures and skip needed reconciliation paths.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controller/runtimeextractor/resources/daemonset.go` around lines 151 -
160, daemonSetExists currently swallows all getDaemonSet errors and returns
false for non-NotFound failures; change its behavior to surface
API/auth/transport errors by converting its signature from daemonSetExists(ctx
context.Context) bool to daemonSetExists(ctx context.Context) (bool, error) and
implement: call getDaemonSet(ctx), if apierrors.IsNotFound(err) return false,
nil; if err != nil return false, err; otherwise return true, nil. Update callers
of daemonSetExists to handle the (bool, error) return and avoid treating
non-NotFound errors as “does not exist.” Ensure references to
ResourceManager.daemonSetExists and getDaemonSet are updated accordingly.

🧹 Nitpick comments (1)

pkg/controller/runtimeextractor/README.md (1)
228-228: 💤 Low value

Clarify the distinction between existing and planned exponential backoff.

Lines 89-93 describe exponential backoff as already implemented for retry-on-conflict. This future enhancement suggests implementing exponential backoff for "reconciliation failures." If this refers to a different backoff mechanism (e.g., between full reconciliation attempts rather than within retry-on-conflict), please clarify the distinction to avoid confusion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/controller/runtimeextractor/README.md` at line 228, The checklist item is
ambiguous about whether it duplicates the existing retry-on-conflict backoff;
update the README entry to clearly distinguish the two: state that
retry.OnConflict (used inside the Reconciler/reconcile loop for update
conflicts) already has exponential backoff, and change the TODO to explicitly
target implementing exponential backoff between full reconciliation attempts
(e.g., backoff on Reconciler errors/failures that abort a reconcile cycle),
mentioning the terms "retry.OnConflict" and "Reconciler/reconcile loop" so
readers can locate the existing implementation and understand this is a separate
mechanism.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/controller/runtimeextractor/informer_test.go`:
- Around line 68-69: The three informerFactory.WaitForCacheSync calls in
informer_test.go currently ignore their boolean return; update each call (the
ones that invoke informerFactory.WaitForCacheSync(ctx.Done())) to capture the
returned map/boolean and assert it succeeded—e.g., if sync returns false fail
the test (t.Fatalf or require.True) with a clear message so tests fail fast when
caches didn't initialize before further actions like updates/deletes; locate the
calls by the WaitForCacheSync invocations in the test and add the boolean check
and test failure handling.

In `@pkg/controller/runtimeextractor/resources/daemonset.go`:
- Around line 25-33: The current defaults extractorDefaultImage,
exporterDefaultImage and proxyDefaultImage use :latest which is unsafe; remove
the :latest fallbacks and change the image resolution logic (the code that reads
exporterImageEnv/proxyImageEnv and the extractor env lookup around where
envImageErrMsg is used and also in the block referenced at lines ~109-125) to
fail fast: when os.LookupEnv for RELATED_IMAGE_* returns missing or empty,
return/propagate an error or log a clear fatal and do not substitute a hardcoded
:latest; update envImageErrMsg to indicate a missing RELATED_IMAGE and include
the env var name so callers receive a deterministically failing error instead of
silently using :latest.

---

Duplicate comments:
In `@pkg/controller/runtimeextractor/controller.go`:
- Around line 151-183: The createDeployment, deleteDeployment, and
updateDeployment functions currently only log failures and swallow errors;
change each function (createDeployment, deleteDeployment, updateDeployment) to
return error instead of void, propagate and return the error from
resourceManager.ApplyRuntimeExtractorResources / DeleteRuntimeExtractorResources
(and any isCreated checks as needed) rather than only klog.Errorf, and update
the caller(s) (the reconcile loop that invokes these methods) to handle returned
errors (retry/requeue or surface them to controller-runtime) so reconcile
failures are not silently ignored.

In `@pkg/controller/runtimeextractor/informer.go`:
- Around line 96-101: The delete handler in
resourceInformer.handleResourceDeletion must handle
cache.DeletedFinalStateUnknown tombstones before calling meta.Accessor: detect
if obj is a cache.DeletedFinalStateUnknown, extract the actual object via
tombstone.Obj (or log+return if unwrap fails), then proceed to call
meta.Accessor; update the function to unwrap tombstones and handle unwrap
failures so external deletions correctly trigger reconciliation.

In `@pkg/controller/runtimeextractor/resources/daemonset_test.go`:
- Around line 104-107: The test currently uses assert.NotEqual on
ds.Spec.Template.Spec.Containers[0..2].Image which only verifies they changed
from "old-image"; replace these with exact comparisons to the expected
environment-derived image strings (the values your test derives from env vars or
helper like expectedImageX) by asserting equality against
ds.Spec.Template.Spec.Containers[0].Image, [1].Image and [2].Image respectively;
locate the expected values where the test sets/reads the envs and use those
variables (or a helper function used in the test) to assert assert.Equal(t,
expectedImage0, ds.Spec.Template.Spec.Containers[0].Image) etc.

In `@pkg/controller/runtimeextractor/resources/daemonset.go`:
- Around line 151-160: daemonSetExists currently swallows all getDaemonSet
errors and returns false for non-NotFound failures; change its behavior to
surface API/auth/transport errors by converting its signature from
daemonSetExists(ctx context.Context) bool to daemonSetExists(ctx
context.Context) (bool, error) and implement: call getDaemonSet(ctx), if
apierrors.IsNotFound(err) return false, nil; if err != nil return false, err;
otherwise return true, nil. Update callers of daemonSetExists to handle the
(bool, error) return and avoid treating non-NotFound errors as “does not exist.”
Ensure references to ResourceManager.daemonSetExists and getDaemonSet are
updated accordingly.

In `@pkg/controller/status/controller.go`:
- Around line 205-208: The current blocking send to c.updateCh when
versionChanged can stall reconciliation; change it to a nil-safe, non-blocking
send by checking c.updateCh != nil and using a select with a default case (e.g.,
select { case c.updateCh <- struct{}{}: default: }) so the reconcile path never
blocks if the channel is full or unset; update the code around the
versionChanged branch in the status merge/reconcile function to perform this
check and non-blocking send.

---

Nitpick comments:
In `@pkg/controller/runtimeextractor/README.md`:
- Line 228: The checklist item is ambiguous about whether it duplicates the
existing retry-on-conflict backoff; update the README entry to clearly
distinguish the two: state that retry.OnConflict (used inside the
Reconciler/reconcile loop for update conflicts) already has exponential backoff,
and change the TODO to explicitly target implementing exponential backoff
between full reconciliation attempts (e.g., backoff on Reconciler
errors/failures that abort a reconcile cycle), mentioning the terms
"retry.OnConflict" and "Reconciler/reconcile loop" so readers can locate the
existing implementation and understand this is a separate mechanism.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 4c938da1-bd84-4417-8026-99d91354aa1a

📥 Commits

Reviewing files that changed from the base of the PR and between 8518be5 and 847d7e6.

📒 Files selected for processing (10)

pkg/controller/operator.go
pkg/controller/runtimeextractor/README.md
pkg/controller/runtimeextractor/controller.go
pkg/controller/runtimeextractor/informer.go
pkg/controller/runtimeextractor/informer_test.go
pkg/controller/runtimeextractor/resources/daemonset.go
pkg/controller/runtimeextractor/resources/daemonset_test.go
pkg/controller/runtimeextractor/resources/manager.go
pkg/controller/runtimeextractor/resources/retry_test.go
pkg/controller/status/controller.go

coderabbitai · 2026-05-20T12:00:53Z

+	informerFactory.WaitForCacheSync(ctx.Done())
+


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

# First, locate and examine the file fd -t f "informer_test.go" | grep "runtimeextractor"

Repository: openshift/insights-operator

Length of output: 120

🏁 Script executed:

# Read the file to check the specific lines cat -n pkg/controller/runtimeextractor/informer_test.go

Repository: openshift/insights-operator

Length of output: 6462

Check WaitForCacheSync return value to validate cache initialization.

These three WaitForCacheSync calls ignore the boolean return value. If cache sync fails (returns false), the informer won't have synced state before the test proceeds with updates/deletes, causing confusing test timeouts instead of explicit failures. Check the return value and fail the test if sync incomplete.

Affects: lines 68, 115, 159

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/runtimeextractor/informer_test.go` around lines 68 - 69, The three informerFactory.WaitForCacheSync calls in informer_test.go currently ignore their boolean return; update each call (the ones that invoke informerFactory.WaitForCacheSync(ctx.Done())) to capture the returned map/boolean and assert it succeeded—e.g., if sync returns false fail the test (t.Fatalf or require.True) with a clear message so tests fail fast when caches didn't initialize before further actions like updates/deletes; locate the calls by the WaitForCacheSync invocations in the test and add the boolean check and test failure handling.

coderabbitai · 2026-05-20T12:00:53Z

+	extractorDefaultImage = "quay.io/openshift/origin-insights-runtime-extractor:latest"
+
+	exporterImageEnv     = "RELATED_IMAGE_INSIGHTS_RUNTIME_EXPORTER"
+	exporterDefaultImage = "quay.io/openshift/origin-insights-runtime-exporter:latest"
+
+	proxyImageEnv     = "RELATED_IMAGE_KUBE_RBAC_PROXY"
+	proxyDefaultImage = "quay.io/openshift/origin-kube-rbac-proxy:latest"
+
+	envImageErrMsg = "Failed to get image from environment variable %s, using default image %s"


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid :latest fallback for missing RELATED_IMAGE env vars.

Falling back to hardcoded :latest images makes deployments non-deterministic and risky in disconnected/pinned-image environments. Prefer failing fast and surfacing a clear error.

Also applies to: 109-125

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/runtimeextractor/resources/daemonset.go` around lines 25 - 33, The current defaults extractorDefaultImage, exporterDefaultImage and proxyDefaultImage use :latest which is unsafe; remove the :latest fallbacks and change the image resolution logic (the code that reads exporterImageEnv/proxyImageEnv and the extractor env lookup around where envImageErrMsg is used and also in the block referenced at lines ~109-125) to fail fast: when os.LookupEnv for RELATED_IMAGE_* returns missing or empty, return/propagate an error or log a clear fatal and do not substitute a hardcoded :latest; update envImageErrMsg to indicate a missing RELATED_IMAGE and include the env var name so callers receive a deterministically failing error instead of silently using :latest.

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Apr 29, 2026

openshift-ci Bot requested review from katushiik11 and ncaak April 29, 2026 13:55

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2026

coderabbitai Bot reviewed Apr 29, 2026

View reviewed changes

opokornyy added 3 commits May 18, 2026 10:17

feat(manifests): reorganize runtime-extractor manifests

f632258

Move runtime-extractor manifests to dedicated directory and update clusterrole permissions for controller resource management. Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>

opokornyy force-pushed the CCXDEV-15790-extractor-controller branch from b2636bb to 25ff9a1 Compare May 18, 2026 12:24

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

opokornyy force-pushed the CCXDEV-15790-extractor-controller branch from 25ff9a1 to 8518be5 Compare May 18, 2026 12:34

coderabbitai Bot reviewed May 18, 2026

View reviewed changes

opokornyy added 2 commits May 20, 2026 13:45

feat: implement reconciler of runtime-extractor

6674b77

This commit is implementing the Kubernetes informer-based drift detection and reconciliation for runtime-extractor resources (DaemonSet, Service, ConfigMap) Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>

feat: remove interface from constructor

847d7e6

Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>

opokornyy force-pushed the CCXDEV-15790-extractor-controller branch from 8518be5 to 847d7e6 Compare May 20, 2026 11:45

coderabbitai Bot reviewed May 20, 2026

View reviewed changes

Conversation

opokornyy commented Apr 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Categories

Sample Archive

Documentation

Unit Tests

Privacy

Changelog

Breaking Changes

References

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Apr 29, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Categories

Sample Archive

Documentation

Unit Tests

Privacy

Changelog

Breaking Changes

References

Uh oh!

coderabbitai Bot commented Apr 29, 2026 • edited by openshift-ci Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

openshift-ci Bot commented Apr 29, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

opokornyy commented May 4, 2026

Uh oh!

opokornyy commented May 18, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 18, 2026

Choose a reason for hiding this comment

Uh oh!

opokornyy commented May 18, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

opokornyy commented May 19, 2026

Uh oh!

opokornyy commented May 20, 2026

Uh oh!

openshift-ci Bot commented May 20, 2026

Uh oh!

coderabbitai Bot left a comment

opokornyy commented Apr 29, 2026 •

edited by coderabbitai Bot

Loading

openshift-ci-robot commented Apr 29, 2026 •

edited by openshift-ci Bot

Loading

coderabbitai Bot commented Apr 29, 2026 •

edited by openshift-ci Bot

Loading