feat(notification): skip waitFor notifications from stale generations#3028
feat(notification): skip waitFor notifications from stale generations#3028adityathebe wants to merge 1 commit into
Conversation
Pending waitFor notifications can outlive the Kubernetes rollout generation that created them. If a later rollout re-enters the same health state, the older pending row may send against the new resource state. Store the config metadata.generation in the notification payload and compare it during pending processing. When the current generation differs, mark the pending history skipped instead of sending it.
WalkthroughThe PR introduces generation-based staleness detection for notifications. It adds logic to extract Changes
Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
✨ Simplify code
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
notification/job.go (1)
440-452: Check generation before the wait-for re-evaluation path.This runs after
shouldSkipNotificationDueToHealth, so Kubernetes-scraped configs can still trigger one more incremental scrape/requeue before the stale row is skipped. If the intent is to drop stale waitFor rows as soon as they become eligible, move the generation comparison ahead of that helper or fold it into that helper’s early-exit logic.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@notification/job.go` around lines 440 - 452, The generation check is happening after the wait-for re-evaluation path (shouldSkipNotificationDueToHealth), allowing one extra scrape; move the generation comparison earlier so stale waitFor rows are dropped immediately: call or inline shouldSkipDueToGeneration (and use generationChangedMessage for the reason) before invoking shouldSkipNotificationDueToHealth, or fold the generation logic into shouldSkipNotificationDueToHealth so it returns an early-exit when generation changed; ensure the same DB update and traceLog behavior (using notif.ID, currentHistory.ID, payload.EventName, payload.ResourceID) is preserved when marking status as NotificationStatusSkipped and setting the error reason.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@notification/job.go`:
- Around line 440-452: The generation check is happening after the wait-for
re-evaluation path (shouldSkipNotificationDueToHealth), allowing one extra
scrape; move the generation comparison earlier so stale waitFor rows are dropped
immediately: call or inline shouldSkipDueToGeneration (and use
generationChangedMessage for the reason) before invoking
shouldSkipNotificationDueToHealth, or fold the generation logic into
shouldSkipNotificationDueToHealth so it returns an early-exit when generation
changed; ensure the same DB update and traceLog behavior (using notif.ID,
currentHistory.ID, payload.EventName, payload.ResourceID) is preserved when
marking status as NotificationStatusSkipped and setting the error reason.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 941d9341-670d-4e17-8cf5-35e72e63fc39
📒 Files selected for processing (4)
notification/generation.gonotification/job.gonotification/notification_test.gonotification/send.go
Pending
waitFornotifications can remain pending while a Kubernetes resource moves on to a newer rollout generation.Store the config
metadata.generationin the notification payload when the pending notification is created. During pending processing, compare it with the current resource generation.If the generation changed, treat the pending notification as stale and mark it skipped instead of sending it for the newer rollout.
Summary by CodeRabbit