Skip to content

fix: drop stale subscription updates#1264

Draft
thlorenz wants to merge 4 commits into
masterfrom
thlorenz/fix-recreate-evicted-accounts
Draft

fix: drop stale subscription updates#1264
thlorenz wants to merge 4 commits into
masterfrom
thlorenz/fix-recreate-evicted-accounts

Conversation

@thlorenz
Copy link
Copy Markdown
Collaborator

@thlorenz thlorenz commented May 29, 2026

Summary

Prevent stale forwarded subscription updates from recreating local bank state for accounts that are no longer watched by the remote account provider.

Details

magicblock-chainlink

  • Drop subscription updates in the fetch cloner when the provider no longer watches the account, while preserving the existing delegated-account discovery path.
  • If a stale update arrives for an account that is still present locally, route cleanup through the existing account-removal listener instead of evicting directly from the subscription-update task.
  • Serialize the final watched-account check and eviction submission for the same pubkey so a concurrent re-subscription cannot race with stale-account cleanup.
  • Keep unsubscribe behavior and the SubMux forwarder hot path unchanged.
  • Add regression coverage for stale updates, removal enqueueing, watched-again skip behavior, and same-pubkey eviction/subscription serialization.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved account removal handling to prevent race conditions between subscription updates and account eviction.
    • Stale subscription updates for unwatched accounts are now dropped early and queued for removal notifications.
  • Tests

    • Added comprehensive test coverage for subscription-update scenarios with unwatched and absent accounts.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 29, 2026

📝 Walkthrough

Walkthrough

This pull request implements a race-condition mitigation for account lifecycle management in the Chainlink validator. It introduces a per-pubkey subscription lock mechanism that prevents concurrent eviction and subscription acquisition for the same account. The pattern flows through three layers: a new evict_unwatched_with_subscription_lock helper in RemoteAccountProvider, early stale update filtering in FetchCloner, and coordination in the Chainlink removal notification path. The changes include updates to 11 existing tests to establish correct subscription ownership preconditions and 5 new tests validating the race-safe eviction behavior.

Suggested reviewers

  • GabrielePicco
  • bmuddha
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch thlorenz/fix-recreate-evicted-accounts

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@thlorenz thlorenz requested a review from GabrielePicco May 29, 2026 11:19
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@magicblock-chainlink/src/chainlink/fetch_cloner/tests.rs`:
- Around line 1666-1673: The test currently breaks out immediately because
clone_request_count() can be 0 at baseline; change the test to first observe
that a clone request was created (wait for
ctx.fetch_cloner.cloner().clone_request_count() to become > 0 or use a
deterministic post-send signal from the code path) then wait for it to drop back
to 0; in other words, gate the zero-check behind a confirmed prior increment (or
await a dedicated processing-complete signal/oneshot you add to the async drop
path) so the test actually exercises the async drop processing in tests.rs where
ctx.fetch_cloner.cloner() and clone_request_count() are used.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6fd6c77b-e2fa-466c-a66d-497b39471bb0

📥 Commits

Reviewing files that changed from the base of the PR and between a49abaa and 7e89661.

📒 Files selected for processing (4)
  • magicblock-chainlink/src/chainlink/fetch_cloner/mod.rs
  • magicblock-chainlink/src/chainlink/fetch_cloner/tests.rs
  • magicblock-chainlink/src/chainlink/mod.rs
  • magicblock-chainlink/src/remote_account_provider/mod.rs

Comment on lines +1666 to +1673
tokio::time::timeout(Duration::from_secs(1), async {
loop {
tokio::time::sleep(Duration::from_millis(10)).await;
if ctx.fetch_cloner.cloner().clone_request_count() == 0 {
break;
}
}
})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Test can pass before the stale update is actually processed

At Line 1669, clone_request_count() == 0 is true at baseline, so this loop can exit immediately and the test may pass without exercising the async drop path at all. Please gate on a post-send processing signal (or at least a deterministic barrier) before asserting zero clone requests.

Suggested adjustment
-    tokio::time::timeout(Duration::from_secs(1), async {
-        loop {
-            tokio::time::sleep(Duration::from_millis(10)).await;
-            if ctx.fetch_cloner.cloner().clone_request_count() == 0 {
-                break;
-            }
-        }
-    })
-    .await
-    .expect("timed out waiting for stale update to be dropped");
+    // Give the subscription task a chance to process the forwarded update.
+    tokio::time::sleep(Duration::from_millis(50)).await;
+    assert_eq!(ctx.fetch_cloner.cloner().clone_request_count(), 0);
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@magicblock-chainlink/src/chainlink/fetch_cloner/tests.rs` around lines 1666 -
1673, The test currently breaks out immediately because clone_request_count()
can be 0 at baseline; change the test to first observe that a clone request was
created (wait for ctx.fetch_cloner.cloner().clone_request_count() to become > 0
or use a deterministic post-send signal from the code path) then wait for it to
drop back to 0; in other words, gate the zero-check behind a confirmed prior
increment (or await a dedicated processing-complete signal/oneshot you add to
the async drop path) so the test actually exercises the async drop processing in
tests.rs where ctx.fetch_cloner.cloner() and clone_request_count() are used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant