CCXDEV-15790: extractor controller#1280
Conversation
|
@opokornyy: This pull request references CCXDEV-15790 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
📝 WalkthroughWalkthroughAdds a runtime-extractor controller, informer, and ResourceManager; embeds and manages a DaemonSet with env-pinned images and retry-on-conflict; wires operator startup and status to signal updates; updates RBAC and Deployment envs/indentation; and adds tests and documentation. ChangesRuntime Extractor feature
Sequence Diagram(s)sequenceDiagram
participant Op as Operator.Run()
participant RC as RuntimeExtractorController
participant CN as ConfigNotifier
participant RM as ResourceManager
participant RI as ResourceInformer
participant K8s as Kubernetes API
Op->>RC: NewRuntimeExtractorController(configNotifier, updateCh, ...)
Op->>RC: Run(ctx)
activate RC
rect rgba(100, 150, 200, 0.5)
Note over RC: Initial Config Check
RC->>CN: Config()
CN-->>RC: InsightsConfiguration
alt DisableRuntimeExtractor = false
RC->>RM: ApplyRuntimeExtractorResources(ctx)
RM->>K8s: Apply DaemonSet
K8s-->>RM: Success
end
end
rect rgba(150, 100, 200, 0.5)
Note over RC: Reconciliation Triggers
par Config Change Event
CN->>CN: ConfigChanged() <- signal
CN-->>RC: Config change notification
RC->>CN: Config()
CN-->>RC: Updated InsightsConfiguration
alt DisableRuntimeExtractor = true
RC->>RM: DeleteRuntimeExtractorResources(ctx)
else
RC->>RM: ApplyRuntimeExtractorResources(ctx)
end
and Cluster Version Update
Op->>RC: Send on updateCh
RC->>CN: Config()
CN-->>RC: InsightsConfiguration
alt DisableRuntimeExtractor = false
RC->>RM: ApplyRuntimeExtractorResources(ctx)
end
and External Resource Drift
K8s->>RI: DaemonSet updated/deleted
RI->>RC: ResourceModified()
RC->>CN: Config()
CN-->>RC: InsightsConfiguration
alt DisableRuntimeExtractor = false
RC->>RM: ApplyRuntimeExtractorResources(ctx)
else
RC->>RM: DeleteRuntimeExtractorResources(ctx)
end
end
end
deactivate RC
Estimated code review effort🎯 4 (Complex) | ⏱️ ~55 minutes 🚥 Pre-merge checks | ✅ 10 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: opokornyy The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 7
🧹 Nitpick comments (1)
pkg/controller/runtimeextractor/informer_test.go (1)
68-68: Assert informer cache sync success to reduce flakiness.
WaitForCacheSyncreturn values are ignored; if sync fails, tests may fail nondeterministically later. Assert all returned values aretruebefore proceeding.✅ Minimal hardening pattern
- informerFactory.WaitForCacheSync(ctx.Done()) + for informerType, synced := range informerFactory.WaitForCacheSync(ctx.Done()) { + if !synced { + t.Fatalf("Informer cache failed to sync: %v", informerType) + } + }Also applies to: 115-115, 159-159
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/controller/runtimeextractor/informer_test.go` at line 68, The call to informerFactory.WaitForCacheSync currently ignores its boolean return values, causing flaky tests; change the test to capture the returned map (e.g., synced := informerFactory.WaitForCacheSync(ctx.Done())) and assert that all values in that map are true (fail the test if any are false) before proceeding—apply the same change for other occurrences of WaitForCacheSync in this test (lines referencing informerFactory.WaitForCacheSync at the other noted locations).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@manifests/03-clusterrole.yaml`:
- Around line 129-140: The ClusterRole currently grants create/delete/update on
apiGroups "apps" resources "daemonsets" which is overbroad; remove the mutating
verbs (create, delete, update) for apps/daemonsets from the ClusterRole and
leave only read verbs (get, list, watch) there, then add those mutating verbs to
the existing namespaced Role that targets the openshift-insights namespace (the
namespaced Role that the runtime-extractor controller uses), and update/confirm
the RoleBinding so the controller's ServiceAccount is bound to that namespaced
Role (ensuring the controller retains get/list/watch at cluster or namespace
scope as intended and only gains create/delete/update within
openshift-insights).
In `@pkg/controller/runtimeextractor/controller.go`:
- Around line 147-179: The createDeployment, deleteDeployment and
updateDeployment functions currently log errors from
resourceManager.ApplyRuntimeExtractorResources / DeleteRuntimeExtractorResources
and swallow them; change each to return an error (e.g., func (re
*runtimeExtractorController) createDeployment(ctx context.Context) error) and
propagate the error to the caller so the reconciler can requeue/retry, or if you
prefer keep void signatures then explicitly trigger a reschedule (e.g.,
requeue/reconcile helper) on failure; also return early on isCreated checks as
before but ensure deleteDeployment returns nil when nothing to delete and
non-nil when DeleteRuntimeExtractorResources fails, and updateDeployment returns
the Apply error so transient API failures bubble up rather than being silently
logged.
In `@pkg/controller/runtimeextractor/informer.go`:
- Around line 103-121: In handleResourceDeletion (type resourceInformer) detect
and unwrap cache.DeletedFinalStateUnknown tombstones before calling
meta.Accessor: if obj is a tombstone, extract the tombstone.Obj (or underlying
runtime.Object) and reassign it to obj so meta.Accessor succeeds; then continue
the existing namespace/name checks and call ri.notifyModification() when
runtimeExtractorName is deleted. Ensure the tombstone handling uses the
cache.DeletedFinalStateUnknown type assertion and preserves existing logging
behavior.
In `@pkg/controller/runtimeextractor/README.md`:
- Around line 54-141: Update the README fenced code blocks that lack language
tags by adding appropriate languages: change the ASCII flow diagrams under the
"Reconciliation Flow", "Initial Deployment", "Configuration Change", and
"Resource Drift" headings to use a language tag such as text or mermaid (e.g.,
replace ``` with ```text or ```mermaid), keep the "Watch patterns" Go block as
```go and ensure the resource labels block remains ```yaml; this will satisfy
markdownlint MD040 without altering the content.
In `@pkg/controller/runtimeextractor/resources/daemonset_test.go`:
- Around line 104-107: The test currently only asserts images changed; replace
the NotEqual checks with exact equality checks against the expected env-derived
image strings so the test fails on incorrect mappings: in
pkg/controller/runtimeextractor/resources/daemonset_test.go update the
assertions for ds.Spec.Template.Spec.Containers[0].Image, [1].Image and
[2].Image to assert Equal against the precise expected image values (either
compute them from the same environment variables used by the code under test or
hard-code the exact expected strings used in the test setup) instead of using
assert.NotEqual.
In `@pkg/controller/runtimeextractor/resources/daemonset.go`:
- Around line 151-162: daemonSetExists currently swallows all get errors and
returns false, which treats API/auth/transport failures as "not found"; change
the signature to return (bool, error) (e.g., daemonSetExists(ctx
context.Context) (bool, error)), call rm.getDaemonSet(ctx) and if
apierrors.IsNotFound(err) return (false, nil), if err != nil return (false,
err), otherwise return (true, nil); update all callers of daemonSetExists to
handle the error return and only treat NotFound as absence while surfacing other
errors so reconciliation isn't skipped on temporary API failures.
In `@pkg/controller/status/controller.go`:
- Around line 205-208: The direct blocking send to c.updateCh in the status
reconciliation path can deadlock merge() if updateCh is nil or full; change it
to a nil-safe, non-blocking signal: check c.updateCh != nil and then use a
non-blocking send (select with a case c.updateCh <- struct{}{} and a default) so
you only attempt to signal when the channel exists and won't block if it's full.
---
Nitpick comments:
In `@pkg/controller/runtimeextractor/informer_test.go`:
- Line 68: The call to informerFactory.WaitForCacheSync currently ignores its
boolean return values, causing flaky tests; change the test to capture the
returned map (e.g., synced := informerFactory.WaitForCacheSync(ctx.Done())) and
assert that all values in that map are true (fail the test if any are false)
before proceeding—apply the same change for other occurrences of
WaitForCacheSync in this test (lines referencing
informerFactory.WaitForCacheSync at the other noted locations).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 5cc38019-b1ed-410e-8cff-a4fd86c65c10
📒 Files selected for processing (15)
manifests/03-clusterrole.yamlmanifests/06-deployment.yamlpkg/controller/operator.gopkg/controller/runtimeextractor/README.mdpkg/controller/runtimeextractor/controller.gopkg/controller/runtimeextractor/doc.gopkg/controller/runtimeextractor/informer.gopkg/controller/runtimeextractor/informer_test.gopkg/controller/runtimeextractor/resources/daemonset.gopkg/controller/runtimeextractor/resources/daemonset_test.gopkg/controller/runtimeextractor/resources/manager.gopkg/controller/runtimeextractor/resources/manifests/runtime-extractor-daemonset.yamlpkg/controller/runtimeextractor/resources/retry_test.gopkg/controller/status/controller.gopkg/controller/status/controller_test.go
| - apiGroups: | ||
| - "apps" | ||
| resources: | ||
| - daemonsets | ||
| verbs: | ||
| - get | ||
| - list | ||
| - watch | ||
| - create | ||
| - delete | ||
| - update | ||
| --- |
There was a problem hiding this comment.
Scope DaemonSet write permissions to namespace-level Role (least privilege).
Granting create/delete/update on apps/daemonsets in a ClusterRole allows mutating DaemonSets cluster-wide. The runtime-extractor controller manages a namespaced DaemonSet (openshift-insights), so this should be moved to the existing namespaced Role in openshift-insights.
🔧 Suggested RBAC tightening
- - apiGroups:
- - "apps"
- resources:
- - daemonsets
- verbs:
- - get
- - list
- - watch
- - create
- - delete
- - update # In Role `insights-operator` (namespace: openshift-insights), add:
+ - apiGroups:
+ - apps
+ resources:
+ - daemonsets
+ verbs:
+ - get
+ - list
+ - watch
+ - create
+ - delete
+ - update🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@manifests/03-clusterrole.yaml` around lines 129 - 140, The ClusterRole
currently grants create/delete/update on apiGroups "apps" resources "daemonsets"
which is overbroad; remove the mutating verbs (create, delete, update) for
apps/daemonsets from the ClusterRole and leave only read verbs (get, list,
watch) there, then add those mutating verbs to the existing namespaced Role that
targets the openshift-insights namespace (the namespaced Role that the
runtime-extractor controller uses), and update/confirm the RoleBinding so the
controller's ServiceAccount is bound to that namespaced Role (ensuring the
controller retains get/list/watch at cluster or namespace scope as intended and
only gains create/delete/update within openshift-insights).
| func (re *runtimeExtractorController) createDeployment(ctx context.Context) { | ||
| klog.Info("Creating runtime extractor resources") | ||
|
|
||
| if err := re.resourceManager.ApplyRuntimeExtractorResources(ctx); err != nil { | ||
| klog.Errorf("Failed to apply runtime extractor resources: %v", err) | ||
| } | ||
| } | ||
|
|
||
| func (re *runtimeExtractorController) deleteDeployment(ctx context.Context) { | ||
| klog.Info("Deleting runtime extractor resources") | ||
|
|
||
| if !re.isCreated(ctx) { | ||
| klog.Info("Runtime extractor resources do not exist, nothing to delete") | ||
| return | ||
| } | ||
|
|
||
| if err := re.resourceManager.DeleteRuntimeExtractorResources(ctx); err != nil { | ||
| klog.Errorf("Failed to delete runtime extractor resources: %v", err) | ||
| } | ||
| } | ||
|
|
||
| func (re *runtimeExtractorController) updateDeployment(ctx context.Context) { | ||
| klog.Info("Updating runtime extractor resources") | ||
|
|
||
| // Avoid creating it when the cluster version is updated | ||
| if !re.isCreated(ctx) { | ||
| klog.Info("Runtime extractor resources not found, skipping update") | ||
| return | ||
| } | ||
|
|
||
| if err := re.resourceManager.ApplyRuntimeExtractorResources(ctx); err != nil { | ||
| klog.Errorf("Failed to apply runtime extractor resources: %v", err) | ||
| } |
There was a problem hiding this comment.
Surface reconcile failures instead of logging them away.
These helpers swallow apply/delete errors after logging. Because the controller is event-driven, a transient API failure can leave the runtime-extractor resources out of sync indefinitely if no new event arrives.
Please bubble the error into a retry/requeue path, or explicitly reschedule reconciliation.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/controller/runtimeextractor/controller.go` around lines 147 - 179, The
createDeployment, deleteDeployment and updateDeployment functions currently log
errors from resourceManager.ApplyRuntimeExtractorResources /
DeleteRuntimeExtractorResources and swallow them; change each to return an error
(e.g., func (re *runtimeExtractorController) createDeployment(ctx
context.Context) error) and propagate the error to the caller so the reconciler
can requeue/retry, or if you prefer keep void signatures then explicitly trigger
a reschedule (e.g., requeue/reconcile helper) on failure; also return early on
isCreated checks as before but ensure deleteDeployment returns nil when nothing
to delete and non-nil when DeleteRuntimeExtractorResources fails, and
updateDeployment returns the Apply error so transient API failures bubble up
rather than being silently logged.
| // handleResourceDeletion processes resource deletion events | ||
| func (ri *resourceInformer) handleResourceDeletion(obj interface{}, resourceType string) { | ||
| metadata, err := meta.Accessor(obj) | ||
| if err != nil { | ||
| klog.Errorf("Failed to get metadata from deleted runtime extractor %s: %v", resourceType, err) | ||
| return | ||
| } | ||
|
|
||
| // Only care about our specific resources | ||
| if metadata.GetNamespace() != runtimeExtractorNamespace { | ||
| return | ||
| } | ||
|
|
||
| name := metadata.GetName() | ||
| if name == runtimeExtractorName { | ||
| klog.Infof("Runtime extractor %s %s/%s was deleted externally, triggering reconciliation", | ||
| resourceType, metadata.GetNamespace(), name) | ||
| ri.notifyModification() | ||
| } |
There was a problem hiding this comment.
Handle tombstones in delete events.
DeleteFunc can receive cache.DeletedFinalStateUnknown tombstones from client-go. meta.Accessor(obj) will fail on that wrapper, so deletions of the target DaemonSet can be missed instead of triggering reconciliation.
Please unwrap the tombstone before filtering.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/controller/runtimeextractor/informer.go` around lines 103 - 121, In
handleResourceDeletion (type resourceInformer) detect and unwrap
cache.DeletedFinalStateUnknown tombstones before calling meta.Accessor: if obj
is a tombstone, extract the tombstone.Obj (or underlying runtime.Object) and
reassign it to obj so meta.Accessor succeeds; then continue the existing
namespace/name checks and call ri.notifyModification() when runtimeExtractorName
is deleted. Ensure the tombstone handling uses the
cache.DeletedFinalStateUnknown type assertion and preserves existing logging
behavior.
| ```go | ||
| // Watch patterns | ||
| DaemonSet: openshift-insights/insights-runtime-extractor | ||
| ``` | ||
|
|
||
| ### Detection Criteria | ||
|
|
||
| The informer uses **generation-based filtering** to minimize unnecessary reconciliations: | ||
|
|
||
| **DaemonSet**: | ||
| - Checks if `Generation` changed (only increments when spec changes, not status) | ||
| - Filters out ~90% of update events (status updates, controller metadata changes) | ||
| - Deletion events always trigger reconciliation | ||
|
|
||
| **Why generation-based?** | ||
| - **Simple**: Single field check instead of deep comparison | ||
| - **Efficient**: Filters out 90% of noise (status updates, reconciliation loops) | ||
| - **Reliable**: Kubernetes guarantees generation increments on spec changes | ||
| - **Performant**: `resourceapply` handles detailed comparison and actual updates | ||
|
|
||
| ### Reconciliation Flow | ||
|
|
||
| ``` | ||
| External Change → Informer Detects → Notification Sent → Controller Reconciles → State Restored | ||
| ``` | ||
|
|
||
| ## Resource Protection | ||
|
|
||
| ### Server-Side Apply | ||
| The controller uses `resourceapply` functions from `library-go`, which provide: | ||
| - **Three-way merge**: Preserves fields managed by other controllers | ||
| - **Generation tracking**: Detects meaningful changes | ||
| - **Conflict resolution**: Handles concurrent modifications gracefully | ||
|
|
||
| ### Retry Logic | ||
| All resource apply operations use `retry.RetryOnConflict` with exponential backoff: | ||
| - **Automatic retry**: Conflicts are automatically retried (up to 5 attempts by default) | ||
| - **Exponential backoff**: Delays increase between retries (10ms, 20ms, 40ms, etc.) | ||
| - **Jitter**: Random delay variation to prevent thundering herd | ||
| - **Success guarantee**: Eventually consistent even under concurrent modifications | ||
|
|
||
| ### Ownership | ||
| Resources are labeled with: | ||
| ```yaml | ||
| labels: | ||
| app.kubernetes.io/managed-by: insights-operator | ||
| app.kubernetes.io/name: insights-runtime-extractor | ||
| ``` | ||
|
|
||
| ## Event Flow | ||
|
|
||
| ### Initial Deployment | ||
| ``` | ||
| Operator Starts | ||
| ↓ | ||
| Read Configuration | ||
| ↓ | ||
| DisableRuntimeExtractor == false? | ||
| ↓ (yes) | ||
| Create/Update Resources | ||
| ↓ | ||
| Monitor for Changes | ||
| ``` | ||
|
|
||
| ### Configuration Change | ||
| ``` | ||
| Config Updated | ||
| ↓ | ||
| ConfigNotifier Sends Event | ||
| ↓ | ||
| Controller Handles Config Change | ||
| ↓ | ||
| Enable: Apply Resources | ||
| Disable: Delete Resources | ||
| ``` | ||
|
|
||
| ### Resource Drift | ||
| ``` | ||
| User/Process Modifies Resource | ||
| ↓ | ||
| Informer Detects Change | ||
| ↓ | ||
| Notification Sent to modifiedCh | ||
| ↓ | ||
| Controller Reapplies Desired State | ||
| ↓ | ||
| Resource Restored | ||
| ``` |
There was a problem hiding this comment.
Add language tags to the fenced blocks.
The plain fences in this section are triggering markdownlint MD040. Please mark them as text, mermaid, or an equivalent language so the docs pass lint cleanly.
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 76-76: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 106-106: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 119-119: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
[warning] 131-131: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/controller/runtimeextractor/README.md` around lines 54 - 141, Update the
README fenced code blocks that lack language tags by adding appropriate
languages: change the ASCII flow diagrams under the "Reconciliation Flow",
"Initial Deployment", "Configuration Change", and "Resource Drift" headings to
use a language tag such as text or mermaid (e.g., replace ``` with ```text or
```mermaid), keep the "Watch patterns" Go block as ```go and ensure the resource
labels block remains ```yaml; this will satisfy markdownlint MD040 without
altering the content.
| // Verify images were updated from environment variables | ||
| assert.NotEqual(t, "old-image", ds.Spec.Template.Spec.Containers[0].Image) | ||
| assert.NotEqual(t, "old-image", ds.Spec.Template.Spec.Containers[1].Image) | ||
| assert.NotEqual(t, "old-image", ds.Spec.Template.Spec.Containers[2].Image) |
There was a problem hiding this comment.
Assert the expected images, not just a change.
These checks would still pass if the wrong image were loaded. Compare against the exact env-derived values so the test catches mapping regressions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/controller/runtimeextractor/resources/daemonset_test.go` around lines 104
- 107, The test currently only asserts images changed; replace the NotEqual
checks with exact equality checks against the expected env-derived image strings
so the test fails on incorrect mappings: in
pkg/controller/runtimeextractor/resources/daemonset_test.go update the
assertions for ds.Spec.Template.Spec.Containers[0].Image, [1].Image and
[2].Image to assert Equal against the precise expected image values (either
compute them from the same environment variables used by the code under test or
hard-code the exact expected strings used in the test setup) instead of using
assert.NotEqual.
| // daemonSetExists checks if the runtime extractor DaemonSet exists | ||
| func (rm *ResourceManager) daemonSetExists(ctx context.Context) bool { | ||
| _, err := rm.getDaemonSet(ctx) | ||
| if err != nil { | ||
| if apierrors.IsNotFound(err) { | ||
| return false | ||
| } | ||
| klog.Errorf("Failed to get runtime extractor DaemonSet %s/%s: %v", daemonSetNamespace, daemonSetName, err) | ||
| return false | ||
| } | ||
| return true | ||
| } |
There was a problem hiding this comment.
Don't treat API failures as resource absence.
daemonSetExists() returns false for every get error, not just NotFound. That makes the controller skip reconciliation during temporary API/auth/transport failures, which can leave the runtime-extractor state stale.
Please return the error separately and only map NotFound to false.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/controller/runtimeextractor/resources/daemonset.go` around lines 151 -
162, daemonSetExists currently swallows all get errors and returns false, which
treats API/auth/transport failures as "not found"; change the signature to
return (bool, error) (e.g., daemonSetExists(ctx context.Context) (bool, error)),
call rm.getDaemonSet(ctx) and if apierrors.IsNotFound(err) return (false, nil),
if err != nil return (false, err), otherwise return (true, nil); update all
callers of daemonSetExists to handle the error return and only treat NotFound as
absence while surfacing other errors so reconciliation isn't skipped on
temporary API failures.
| // Update runtime-extractor images on any version change | ||
| if versionChanged { | ||
| c.updateCh <- struct{}{} | ||
| } |
There was a problem hiding this comment.
Avoid blocking send on updateCh in status reconciliation path.
The direct send can deadlock merge() if updateCh is nil or already full. Use a non-blocking send (and nil-safe guard) to keep status updates resilient.
🔒 Non-blocking, nil-safe signal send
- if versionChanged {
- c.updateCh <- struct{}{}
- }
+ if versionChanged && c.updateCh != nil {
+ select {
+ case c.updateCh <- struct{}{}:
+ default:
+ // signal already pending; avoid blocking status reconciliation
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Update runtime-extractor images on any version change | |
| if versionChanged { | |
| c.updateCh <- struct{}{} | |
| } | |
| // Update runtime-extractor images on any version change | |
| if versionChanged && c.updateCh != nil { | |
| select { | |
| case c.updateCh <- struct{}{}: | |
| default: | |
| // signal already pending; avoid blocking status reconciliation | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/controller/status/controller.go` around lines 205 - 208, The direct
blocking send to c.updateCh in the status reconciliation path can deadlock
merge() if updateCh is nil or full; change it to a nil-safe, non-blocking
signal: check c.updateCh != nil and then use a non-blocking send (select with a
case c.updateCh <- struct{}{} and a default) so you only attempt to signal when
the channel exists and won't block if it's full.
|
/retest |
Move runtime-extractor manifests to dedicated directory and update clusterrole permissions for controller resource management. Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>
Implement controller that manages runtime-extractor resources: - ConfigMap for extractor configuration - DaemonSet for node-level data collection - Service for pod communication - Resource lifecycle management Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>
Add comprehensive tests for runtime-extractor resource management: - ConfigMap creation and updates - DaemonSet lifecycle and configuration - Service resource handling - Common test utilities Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>
b2636bb to
25ff9a1
Compare
|
/test tls-scanner |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/controller/runtimeextractor/informer_test.go (1)
17-17: ⚡ Quick winConsider upgrading to
fake.NewClientset()now that apply configurations are available.
fake.NewSimpleClientsetis deprecated. The codebase has apply configurations available in the vendored k8s.io/client-go library, making an upgrade tofake.NewClientsetfeasible with minimal effort.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/runtimeextractor/informer_test.go` at line 17, The test uses the deprecated fake.NewSimpleClientset; replace its usage by calling fake.NewClientset() when constructing fakeClient (update the instantiation where fakeClient := fake.NewSimpleClientset() appears) to leverage apply-configuration support; ensure any test setup that relied on SimpleClientset behavior still works with the new fake.NewClientset and adjust imports or apply-config usage if necessary.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/controller/runtimeextractor/resources/retry_test.go`:
- Around line 19-84: Convert Test_applyDaemonSet_RetryOnConflict into a
table-driven test: create a test cases slice (even if it contains one case) with
fields like name, initialDaemonSet, reactorBehavior (to simulate conflict then
success), and wantErr; iterate over cases using t.Run(case.name, func(t
*testing.T) { ... }); inside each subtest set up the fakeClient, recorder,
updateAttempts counter and PrependReactor as before, construct the
ResourceManager via NewResourceManager and call applyDaemonSet, then assert
results against case.wantErr and verify updateAttempts; keep the existing logic
but move per-case setup into the loop and reference
Test_applyDaemonSet_RetryOnConflict, NewResourceManager and applyDaemonSet to
locate where to change.
---
Nitpick comments:
In `@pkg/controller/runtimeextractor/informer_test.go`:
- Line 17: The test uses the deprecated fake.NewSimpleClientset; replace its
usage by calling fake.NewClientset() when constructing fakeClient (update the
instantiation where fakeClient := fake.NewSimpleClientset() appears) to leverage
apply-configuration support; ensure any test setup that relied on
SimpleClientset behavior still works with the new fake.NewClientset and adjust
imports or apply-config usage if necessary.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 636e3849-9a58-4315-a611-71b7b299921b
📒 Files selected for processing (15)
manifests/03-clusterrole.yamlmanifests/06-deployment.yamlpkg/controller/operator.gopkg/controller/runtimeextractor/README.mdpkg/controller/runtimeextractor/controller.gopkg/controller/runtimeextractor/doc.gopkg/controller/runtimeextractor/informer.gopkg/controller/runtimeextractor/informer_test.gopkg/controller/runtimeextractor/resources/daemonset.gopkg/controller/runtimeextractor/resources/daemonset_test.gopkg/controller/runtimeextractor/resources/manager.gopkg/controller/runtimeextractor/resources/manifests/runtime-extractor-daemonset.yamlpkg/controller/runtimeextractor/resources/retry_test.gopkg/controller/status/controller.gopkg/controller/status/controller_test.go
✅ Files skipped from review due to trivial changes (1)
- pkg/controller/runtimeextractor/doc.go
🚧 Files skipped from review as they are similar to previous changes (9)
- pkg/controller/runtimeextractor/resources/manager.go
- pkg/controller/status/controller_test.go
- pkg/controller/status/controller.go
- manifests/06-deployment.yaml
- pkg/controller/runtimeextractor/resources/daemonset_test.go
- manifests/03-clusterrole.yaml
- pkg/controller/runtimeextractor/controller.go
- pkg/controller/operator.go
- pkg/controller/runtimeextractor/resources/daemonset.go
| func Test_applyDaemonSet_RetryOnConflict(t *testing.T) { | ||
| ctx := context.Background() | ||
|
|
||
| // Create initial DaemonSet | ||
| initialDS := &appsv1.DaemonSet{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: "insights-runtime-extractor", | ||
| Namespace: "openshift-insights", | ||
| ResourceVersion: "1", | ||
| }, | ||
| Spec: appsv1.DaemonSetSpec{ | ||
| Selector: &metav1.LabelSelector{ | ||
| MatchLabels: map[string]string{"app": "test"}, | ||
| }, | ||
| Template: corev1.PodTemplateSpec{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Labels: map[string]string{"app": "test"}, | ||
| }, | ||
| Spec: corev1.PodSpec{ | ||
| Containers: []corev1.Container{ | ||
| {Name: "extractor", Image: "old-image:v1"}, | ||
| {Name: "exporter", Image: "old-image:v1"}, | ||
| {Name: "kube-rbac-proxy", Image: "old-image:v1"}, | ||
| }, | ||
| }, | ||
| }, | ||
| }, | ||
| } | ||
|
|
||
| fakeClient := fake.NewSimpleClientset(initialDS) | ||
| recorder := events.NewInMemoryRecorder("test", clock.RealClock{}) | ||
|
|
||
| // Track update attempts | ||
| updateAttempts := 0 | ||
|
|
||
| // Add reactor to simulate conflict on first attempt, then succeed | ||
| fakeClient.PrependReactor("update", "daemonsets", func(action k8stesting.Action) (handled bool, ret runtime.Object, err error) { | ||
| updateAttempts++ | ||
| if updateAttempts == 1 { | ||
| // First attempt: return conflict error | ||
| return true, nil, apierrors.NewConflict( | ||
| appsv1.Resource("daemonsets"), | ||
| "insights-runtime-extractor", | ||
| nil, | ||
| ) | ||
| } | ||
| // Second attempt: succeed with default behavior | ||
| return false, nil, nil | ||
| }) | ||
|
|
||
| rm := NewResourceManager( | ||
| fakeClient.AppsV1(), | ||
| recorder, | ||
| ) | ||
|
|
||
| // This should succeed after retry | ||
| _, err := rm.applyDaemonSet(ctx) | ||
| if err != nil { | ||
| t.Fatalf("Expected applyDaemonSet to succeed after retry, got error: %v", err) | ||
| } | ||
|
|
||
| // Verify it retried (should have 2 attempts: 1 conflict + 1 success) | ||
| if updateAttempts < 2 { | ||
| t.Errorf("Expected at least 2 update attempts (conflict + retry), got %d", updateAttempts) | ||
| } | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Convert this to a table-driven test to match repo test standards.
This test is currently single-case; please wrap it in a table-driven structure (even with one case) for consistency and easier expansion.
Proposed refactor sketch
-func Test_applyDaemonSet_RetryOnConflict(t *testing.T) {
- ctx := context.Background()
- ...
-}
+func Test_applyDaemonSet_RetryOnConflict(t *testing.T) {
+ tests := []struct {
+ name string
+ }{
+ {name: "retries once on conflict and then succeeds"},
+ }
+
+ for _, tt := range tests {
+ t.Run(tt.name, func(t *testing.T) {
+ ctx := context.Background()
+ // existing setup/assertions
+ })
+ }
+}As per coding guidelines, Use table-driven tests in test implementations.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/controller/runtimeextractor/resources/retry_test.go` around lines 19 -
84, Convert Test_applyDaemonSet_RetryOnConflict into a table-driven test: create
a test cases slice (even if it contains one case) with fields like name,
initialDaemonSet, reactorBehavior (to simulate conflict then success), and
wantErr; iterate over cases using t.Run(case.name, func(t *testing.T) { ... });
inside each subtest set up the fakeClient, recorder, updateAttempts counter and
PrependReactor as before, construct the ResourceManager via NewResourceManager
and call applyDaemonSet, then assert results against case.wantErr and verify
updateAttempts; keep the existing logic but move per-case setup into the loop
and reference Test_applyDaemonSet_RetryOnConflict, NewResourceManager and
applyDaemonSet to locate where to change.
25ff9a1 to
8518be5
Compare
|
/test tls-scanner |
There was a problem hiding this comment.
♻️ Duplicate comments (1)
pkg/controller/runtimeextractor/README.md (1)
76-76:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAdd language tags to fenced code blocks.
The four plain-text flow diagrams (lines 76, 106, 119, 131) are still missing language identifiers, which causes markdownlint MD040 warnings. Add
texttags to each fence.📝 Proposed fix
-``` +```text External Change → Informer Detects → Notification Sent → Controller Reconciles → State RestoredApply the same pattern to the fences at lines 106, 119, and 131. </details> Also applies to: 106-106, 119-119, 131-131 <details> <summary>🤖 Prompt for AI Agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.In
@pkg/controller/runtimeextractor/README.mdat line 76, Update the fenced code
blocks that contain the plain-text flow diagrams so they include the language
tag "text" instead of a bare fence; for example changetotext for the
fence wrapping the flow "External Change → Informer Detects → Notification Sent
→ Controller Reconciles → State Restored" and apply the same change to the other
three flow-diagram fences in the README (the ones around the other two/three
plain-text diagrams). This fixes the MD040 markdownlint warnings by explicitly
marking those blocks as text.</details> </blockquote></details> </blockquote></details> <details> <summary>🤖 Prompt for all review comments with AI agents</summary>Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.Duplicate comments:
In@pkg/controller/runtimeextractor/README.md:
- Line 76: Update the fenced code blocks that contain the plain-text flow
diagrams so they include the language tag "text" instead of a bare fence; for
example changetotext for the fence wrapping the flow "External Change →
Informer Detects → Notification Sent → Controller Reconciles → State Restored"
and apply the same change to the other three flow-diagram fences in the README
(the ones around the other two/three plain-text diagrams). This fixes the MD040
markdownlint warnings by explicitly marking those blocks as text.</details> --- <details> <summary>ℹ️ Review info</summary> <details> <summary>⚙️ Run configuration</summary> **Configuration used**: Repository YAML (base), Central YAML (inherited) **Review profile**: CHILL **Plan**: Enterprise **Run ID**: `b3544c28-21f2-41f9-80f2-7f5edc2bf16b` </details> <details> <summary>📥 Commits</summary> Reviewing files that changed from the base of the PR and between 25ff9a17854af50c1dc15cb80a125fa7d5059607 and 8518be5d65cd51f39c83f8734d2e81d1af205579. </details> <details> <summary>📒 Files selected for processing (10)</summary> * `pkg/controller/operator.go` * `pkg/controller/runtimeextractor/README.md` * `pkg/controller/runtimeextractor/controller.go` * `pkg/controller/runtimeextractor/informer.go` * `pkg/controller/runtimeextractor/informer_test.go` * `pkg/controller/runtimeextractor/resources/daemonset.go` * `pkg/controller/runtimeextractor/resources/daemonset_test.go` * `pkg/controller/runtimeextractor/resources/manager.go` * `pkg/controller/runtimeextractor/resources/retry_test.go` * `pkg/controller/status/controller.go` </details> <details> <summary>🚧 Files skipped from review as they are similar to previous changes (9)</summary> * pkg/controller/runtimeextractor/resources/retry_test.go * pkg/controller/runtimeextractor/resources/manager.go * pkg/controller/runtimeextractor/informer_test.go * pkg/controller/runtimeextractor/resources/daemonset_test.go * pkg/controller/runtimeextractor/informer.go * pkg/controller/runtimeextractor/controller.go * pkg/controller/operator.go * pkg/controller/runtimeextractor/resources/daemonset.go * pkg/controller/status/controller.go </details> </details> <!-- This is an auto-generated comment by CodeRabbit for review status -->
|
/retest |
|
/test tls-scanner |
|
@opokornyy: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
This commit is implementing the Kubernetes informer-based drift detection and reconciliation for runtime-extractor resources (DaemonSet, Service, ConfigMap) Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>
Signed-off-by: Ondrej Pokorny <opokorny@redhat.com>
8518be5 to
847d7e6
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (5)
pkg/controller/status/controller.go (1)
205-208:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winUse a non-blocking, nil-safe send for
updateChin reconciliation.Line 207 performs a blocking send on the status merge path; if the channel is full (or unset in a caller), reconciliation can stall.
🔒 Suggested fix
- if versionChanged { - c.updateCh <- struct{}{} - } + if versionChanged && c.updateCh != nil { + select { + case c.updateCh <- struct{}{}: + default: + // signal already pending; avoid blocking reconciliation + } + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/status/controller.go` around lines 205 - 208, The current blocking send to c.updateCh when versionChanged can stall reconciliation; change it to a nil-safe, non-blocking send by checking c.updateCh != nil and using a select with a default case (e.g., select { case c.updateCh <- struct{}{}: default: }) so the reconcile path never blocks if the channel is full or unset; update the code around the versionChanged branch in the status merge/reconcile function to perform this check and non-blocking send.pkg/controller/runtimeextractor/informer.go (1)
96-101:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHandle tombstones in delete events.
Delete handlers may receive
cache.DeletedFinalStateUnknown; without unwrapping it first, metadata extraction can fail and external deletions may not trigger reconciliation.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/runtimeextractor/informer.go` around lines 96 - 101, The delete handler in resourceInformer.handleResourceDeletion must handle cache.DeletedFinalStateUnknown tombstones before calling meta.Accessor: detect if obj is a cache.DeletedFinalStateUnknown, extract the actual object via tombstone.Obj (or log+return if unwrap fails), then proceed to call meta.Accessor; update the function to unwrap tombstones and handle unwrap failures so external deletions correctly trigger reconciliation.pkg/controller/runtimeextractor/resources/daemonset_test.go (1)
104-107:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winAssert exact image values, not just “changed from old-image.”
These checks can still pass if images are mapped incorrectly. Compare each container image against the expected env-derived value.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/runtimeextractor/resources/daemonset_test.go` around lines 104 - 107, The test currently uses assert.NotEqual on ds.Spec.Template.Spec.Containers[0..2].Image which only verifies they changed from "old-image"; replace these with exact comparisons to the expected environment-derived image strings (the values your test derives from env vars or helper like expectedImageX) by asserting equality against ds.Spec.Template.Spec.Containers[0].Image, [1].Image and [2].Image respectively; locate the expected values where the test sets/reads the envs and use those variables (or a helper function used in the test) to assert assert.Equal(t, expectedImage0, ds.Spec.Template.Spec.Containers[0].Image) etc.pkg/controller/runtimeextractor/controller.go (1)
151-183:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPropagate reconcile errors instead of only logging them.
createDeployment,deleteDeployment, andupdateDeploymentlog failures but always continue. That can leave runtime-extractor resources out of sync indefinitely if no new event arrives.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/runtimeextractor/controller.go` around lines 151 - 183, The createDeployment, deleteDeployment, and updateDeployment functions currently only log failures and swallow errors; change each function (createDeployment, deleteDeployment, updateDeployment) to return error instead of void, propagate and return the error from resourceManager.ApplyRuntimeExtractorResources / DeleteRuntimeExtractorResources (and any isCreated checks as needed) rather than only klog.Errorf, and update the caller(s) (the reconcile loop that invokes these methods) to handle returned errors (retry/requeue or surface them to controller-runtime) so reconcile failures are not silently ignored.pkg/controller/runtimeextractor/resources/daemonset.go (1)
151-160:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not treat all
Getfailures as “does not exist.”
daemonSetExistsreturnsfalsefor non-NotFound errors too, which can hide API/auth/transport failures and skip needed reconciliation paths.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/runtimeextractor/resources/daemonset.go` around lines 151 - 160, daemonSetExists currently swallows all getDaemonSet errors and returns false for non-NotFound failures; change its behavior to surface API/auth/transport errors by converting its signature from daemonSetExists(ctx context.Context) bool to daemonSetExists(ctx context.Context) (bool, error) and implement: call getDaemonSet(ctx), if apierrors.IsNotFound(err) return false, nil; if err != nil return false, err; otherwise return true, nil. Update callers of daemonSetExists to handle the (bool, error) return and avoid treating non-NotFound errors as “does not exist.” Ensure references to ResourceManager.daemonSetExists and getDaemonSet are updated accordingly.
🧹 Nitpick comments (1)
pkg/controller/runtimeextractor/README.md (1)
228-228: 💤 Low valueClarify the distinction between existing and planned exponential backoff.
Lines 89-93 describe exponential backoff as already implemented for retry-on-conflict. This future enhancement suggests implementing exponential backoff for "reconciliation failures." If this refers to a different backoff mechanism (e.g., between full reconciliation attempts rather than within retry-on-conflict), please clarify the distinction to avoid confusion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/controller/runtimeextractor/README.md` at line 228, The checklist item is ambiguous about whether it duplicates the existing retry-on-conflict backoff; update the README entry to clearly distinguish the two: state that retry.OnConflict (used inside the Reconciler/reconcile loop for update conflicts) already has exponential backoff, and change the TODO to explicitly target implementing exponential backoff between full reconciliation attempts (e.g., backoff on Reconciler errors/failures that abort a reconcile cycle), mentioning the terms "retry.OnConflict" and "Reconciler/reconcile loop" so readers can locate the existing implementation and understand this is a separate mechanism.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/controller/runtimeextractor/informer_test.go`:
- Around line 68-69: The three informerFactory.WaitForCacheSync calls in
informer_test.go currently ignore their boolean return; update each call (the
ones that invoke informerFactory.WaitForCacheSync(ctx.Done())) to capture the
returned map/boolean and assert it succeeded—e.g., if sync returns false fail
the test (t.Fatalf or require.True) with a clear message so tests fail fast when
caches didn't initialize before further actions like updates/deletes; locate the
calls by the WaitForCacheSync invocations in the test and add the boolean check
and test failure handling.
In `@pkg/controller/runtimeextractor/resources/daemonset.go`:
- Around line 25-33: The current defaults extractorDefaultImage,
exporterDefaultImage and proxyDefaultImage use :latest which is unsafe; remove
the :latest fallbacks and change the image resolution logic (the code that reads
exporterImageEnv/proxyImageEnv and the extractor env lookup around where
envImageErrMsg is used and also in the block referenced at lines ~109-125) to
fail fast: when os.LookupEnv for RELATED_IMAGE_* returns missing or empty,
return/propagate an error or log a clear fatal and do not substitute a hardcoded
:latest; update envImageErrMsg to indicate a missing RELATED_IMAGE and include
the env var name so callers receive a deterministically failing error instead of
silently using :latest.
---
Duplicate comments:
In `@pkg/controller/runtimeextractor/controller.go`:
- Around line 151-183: The createDeployment, deleteDeployment, and
updateDeployment functions currently only log failures and swallow errors;
change each function (createDeployment, deleteDeployment, updateDeployment) to
return error instead of void, propagate and return the error from
resourceManager.ApplyRuntimeExtractorResources / DeleteRuntimeExtractorResources
(and any isCreated checks as needed) rather than only klog.Errorf, and update
the caller(s) (the reconcile loop that invokes these methods) to handle returned
errors (retry/requeue or surface them to controller-runtime) so reconcile
failures are not silently ignored.
In `@pkg/controller/runtimeextractor/informer.go`:
- Around line 96-101: The delete handler in
resourceInformer.handleResourceDeletion must handle
cache.DeletedFinalStateUnknown tombstones before calling meta.Accessor: detect
if obj is a cache.DeletedFinalStateUnknown, extract the actual object via
tombstone.Obj (or log+return if unwrap fails), then proceed to call
meta.Accessor; update the function to unwrap tombstones and handle unwrap
failures so external deletions correctly trigger reconciliation.
In `@pkg/controller/runtimeextractor/resources/daemonset_test.go`:
- Around line 104-107: The test currently uses assert.NotEqual on
ds.Spec.Template.Spec.Containers[0..2].Image which only verifies they changed
from "old-image"; replace these with exact comparisons to the expected
environment-derived image strings (the values your test derives from env vars or
helper like expectedImageX) by asserting equality against
ds.Spec.Template.Spec.Containers[0].Image, [1].Image and [2].Image respectively;
locate the expected values where the test sets/reads the envs and use those
variables (or a helper function used in the test) to assert assert.Equal(t,
expectedImage0, ds.Spec.Template.Spec.Containers[0].Image) etc.
In `@pkg/controller/runtimeextractor/resources/daemonset.go`:
- Around line 151-160: daemonSetExists currently swallows all getDaemonSet
errors and returns false for non-NotFound failures; change its behavior to
surface API/auth/transport errors by converting its signature from
daemonSetExists(ctx context.Context) bool to daemonSetExists(ctx
context.Context) (bool, error) and implement: call getDaemonSet(ctx), if
apierrors.IsNotFound(err) return false, nil; if err != nil return false, err;
otherwise return true, nil. Update callers of daemonSetExists to handle the
(bool, error) return and avoid treating non-NotFound errors as “does not exist.”
Ensure references to ResourceManager.daemonSetExists and getDaemonSet are
updated accordingly.
In `@pkg/controller/status/controller.go`:
- Around line 205-208: The current blocking send to c.updateCh when
versionChanged can stall reconciliation; change it to a nil-safe, non-blocking
send by checking c.updateCh != nil and using a select with a default case (e.g.,
select { case c.updateCh <- struct{}{}: default: }) so the reconcile path never
blocks if the channel is full or unset; update the code around the
versionChanged branch in the status merge/reconcile function to perform this
check and non-blocking send.
---
Nitpick comments:
In `@pkg/controller/runtimeextractor/README.md`:
- Line 228: The checklist item is ambiguous about whether it duplicates the
existing retry-on-conflict backoff; update the README entry to clearly
distinguish the two: state that retry.OnConflict (used inside the
Reconciler/reconcile loop for update conflicts) already has exponential backoff,
and change the TODO to explicitly target implementing exponential backoff
between full reconciliation attempts (e.g., backoff on Reconciler
errors/failures that abort a reconcile cycle), mentioning the terms
"retry.OnConflict" and "Reconciler/reconcile loop" so readers can locate the
existing implementation and understand this is a separate mechanism.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 4c938da1-bd84-4417-8026-99d91354aa1a
📒 Files selected for processing (10)
pkg/controller/operator.gopkg/controller/runtimeextractor/README.mdpkg/controller/runtimeextractor/controller.gopkg/controller/runtimeextractor/informer.gopkg/controller/runtimeextractor/informer_test.gopkg/controller/runtimeextractor/resources/daemonset.gopkg/controller/runtimeextractor/resources/daemonset_test.gopkg/controller/runtimeextractor/resources/manager.gopkg/controller/runtimeextractor/resources/retry_test.gopkg/controller/status/controller.go
| informerFactory.WaitForCacheSync(ctx.Done()) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine the file
fd -t f "informer_test.go" | grep "runtimeextractor"Repository: openshift/insights-operator
Length of output: 120
🏁 Script executed:
# Read the file to check the specific lines
cat -n pkg/controller/runtimeextractor/informer_test.goRepository: openshift/insights-operator
Length of output: 6462
Check WaitForCacheSync return value to validate cache initialization.
These three WaitForCacheSync calls ignore the boolean return value. If cache sync fails (returns false), the informer won't have synced state before the test proceeds with updates/deletes, causing confusing test timeouts instead of explicit failures. Check the return value and fail the test if sync incomplete.
Affects: lines 68, 115, 159
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/controller/runtimeextractor/informer_test.go` around lines 68 - 69, The
three informerFactory.WaitForCacheSync calls in informer_test.go currently
ignore their boolean return; update each call (the ones that invoke
informerFactory.WaitForCacheSync(ctx.Done())) to capture the returned
map/boolean and assert it succeeded—e.g., if sync returns false fail the test
(t.Fatalf or require.True) with a clear message so tests fail fast when caches
didn't initialize before further actions like updates/deletes; locate the calls
by the WaitForCacheSync invocations in the test and add the boolean check and
test failure handling.
| extractorDefaultImage = "quay.io/openshift/origin-insights-runtime-extractor:latest" | ||
|
|
||
| exporterImageEnv = "RELATED_IMAGE_INSIGHTS_RUNTIME_EXPORTER" | ||
| exporterDefaultImage = "quay.io/openshift/origin-insights-runtime-exporter:latest" | ||
|
|
||
| proxyImageEnv = "RELATED_IMAGE_KUBE_RBAC_PROXY" | ||
| proxyDefaultImage = "quay.io/openshift/origin-kube-rbac-proxy:latest" | ||
|
|
||
| envImageErrMsg = "Failed to get image from environment variable %s, using default image %s" |
There was a problem hiding this comment.
Avoid :latest fallback for missing RELATED_IMAGE env vars.
Falling back to hardcoded :latest images makes deployments non-deterministic and risky in disconnected/pinned-image environments. Prefer failing fast and surfacing a clear error.
Also applies to: 109-125
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/controller/runtimeextractor/resources/daemonset.go` around lines 25 - 33,
The current defaults extractorDefaultImage, exporterDefaultImage and
proxyDefaultImage use :latest which is unsafe; remove the :latest fallbacks and
change the image resolution logic (the code that reads
exporterImageEnv/proxyImageEnv and the extractor env lookup around where
envImageErrMsg is used and also in the block referenced at lines ~109-125) to
fail fast: when os.LookupEnv for RELATED_IMAGE_* returns missing or empty,
return/propagate an error or log a clear fatal and do not substitute a hardcoded
:latest; update envImageErrMsg to indicate a missing RELATED_IMAGE and include
the env var name so callers receive a deterministically failing error instead of
silently using :latest.
Implement controller that manages runtime-extractor resources.
Categories
Sample Archive
NoneDocumentation
pkg/controller/runtimeextractor/README.mdUnit Tests
pkg/controller/runtimeextractor/informer_test.gopkg/controller/runtimeextractor/resources/daemonset_test.gopkg/controller/runtimeextractor/resources/retry_test.goPrivacy
Yes. There are no sensitive data in the newly collected information.
Changelog
NoneBreaking Changes
No
References
https://redhat.atlassian.net/browse/CCXDEV-15790
Summary by CodeRabbit
New Features
Configuration
Documentation
Tests