fix: Adding lock to init relayer instances by NicoMolinaOZ · Pull Request #622 · OpenZeppelin/openzeppelin-relayer

NicoMolinaOZ · 2026-01-20T19:32:59Z

Summary

Adding lock to init relayer instances

Testing Process

Checklist

Add a reference to related issues in the PR description.
Add unit tests if applicable.

Note

If you are using Relayer in your stack, consider adding your team or organization to our list of Relayer Users in the Wild!

Summary by CodeRabbit

New Features
- Distributed locking for relayer initialization and config processing to coordinate work across instances
- Global staleness checks and metadata tracking for relayer last-sync and global-init to avoid redundant work
- In-memory fallback when persistent storage is unavailable; wait/polling behavior to observe completion from other instances
- New generic polling utility to support timed wait/retry patterns
Tests
- Expanded unit and integration tests covering locking, polling, metadata, and initialization outcomes

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-20T19:33:12Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a64cb625-7b54-4530-87c4-06534c319c9e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

This PR introduces distributed locking coordination for relayer initialization across multiple instances. It adds per-relayer staleness checks, implements locking-based initialization in persistent storage mode, and falls back to direct initialization in in-memory mode. Redis sync metadata utilities track initialization timestamps to prevent redundant syncs, while a new repository API exposes storage connection details.

Changes

Cohort / File(s)	Summary
Repository Connection Exposure `src/repositories/relayer/mod.rs`, `src/repositories/relayer/relayer_redis.rs`, `src/repositories/relayer/relayer_in_memory.rs`	Added public `connection_info()` method to RelayerRepository trait returning storage connection details; implemented for RedisRelayerRepository to expose underlying ConnectionManager and key prefix; in-memory variant returns None via default trait implementation. Includes tests validating in-memory behavior.
Relayer Initialization with Locking `src/bootstrap/initialize_relayers.rs`	Branched initialization logic between persistent (distributed locking) and in-memory (direct init) modes. Introduced `RelayerInitResult` enum tracking outcomes (Initialized, SkippedRecentSync, SkippedLockHeld, Failed). Added per-relayer lock acquisition, recent-sync staleness checks, and aggregated failure reporting. New helper functions: `initialize_relayers_with_locking`, `initialize_single_relayer_with_lock`, `initialize_relayers_without_locking`, `count_results`, `initialize_relayer_with_service`.
Redis Sync Metadata Utilities `src/utils/redis.rs`	New functions for tracking relayer last-sync timestamps: `set_relayer_last_sync()`, `get_relayer_last_sync()`, `is_relayer_recently_synced()`. Includes Redis hash operations and comprehensive tests validating set/get behavior and staleness thresholds.

Sequence Diagram(s)

sequenceDiagram
    participant Init as RelayerInitialization
    participant Repo as RelayerRepository
    participant Redis as Redis (Distributed Lock & Metadata)
    participant Service as RelayerService

    rect rgba(100, 150, 200, 0.5)
    note over Init,Service: Persistent Storage Mode (with Locking)
    Init->>Repo: connection_info()
    Repo-->>Init: Some((client, prefix))
    
    loop For each relayer
        Init->>Redis: is_relayer_recently_synced(prefix, relayer_id)
        Redis-->>Init: bool (recent sync check)
        
        alt Not Recently Synced
            Init->>Redis: Acquire per-relayer lock (TTL-based)
            alt Lock Acquired
                Redis-->>Init: Lock acquired
                Init->>Service: initialize(relayer_id)
                Service-->>Init: Result
                Init->>Redis: set_relayer_last_sync(prefix, relayer_id)
                Init->>Redis: Release lock
            else Lock Held by Other Instance
                Redis-->>Init: Lock contention
                Init->>Init: Skip (SkippedLockHeld)
            end
        else Recently Synced
            Init->>Init: Skip (SkippedRecentSync)
        end
    end
    end

    rect rgba(150, 100, 200, 0.5)
    note over Init,Service: In-Memory Mode (no Locking)
    Init->>Repo: connection_info()
    Repo-->>Init: None
    
    loop For each relayer
        Init->>Service: initialize(relayer_id)
        Service-->>Init: Result
    end
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

PR #618: Adds distributed locking and repository connection_info() support with modifications to the same Redis utilities for cross-instance coordination.

Suggested reviewers

collins-w
dylankilkenny
tirumerla

Poem

🐰 Locks and latches, timestamps too,
Relayers sync—once, not twice through!
Redis holds the wisdom of the herd,
Cross-instance harmony, every word. ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Linked Issues check	⚠️ Warning	The PR description explicitly states that the checklist item 'Add a reference to related issues in the PR description' is unchecked, and no linked issues or issue numbers are mentioned anywhere in the description.	Add a reference to related issues or feature requests that motivated this locking mechanism implementation in the PR description or link them via GitHub.
Description check	❓ Inconclusive	The PR description uses the correct template structure but is largely incomplete. While it includes the required section headings (Summary, Testing Process, Checklist), the Summary is minimal (just repeating the title), and the Testing Process section is entirely empty.	Expand the Summary with details about what the lock solves and why it's needed, provide specifics in the Testing Process section, and if applicable, reference related issues or pull requests.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: Adding lock to init relayer instances' is directly related to the main changes in the PR, which introduce distributed locking coordination for relayer initialization.
Out of Scope Changes check	✅ Passed	All changes appear focused on adding distributed locking coordination for relayer initialization, including supporting infrastructure (connection info retrieval, Redis metadata tracking, and result handling).
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch improvements-to-relayers-initialization

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

codecov · 2026-01-20T20:27:49Z

Codecov Report

❌ Patch coverage is 32.50975% with 1038 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.27%. Comparing base (f720040) to head (46959b9).
⚠️ Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
src/utils/redis.rs	0.00%	545 Missing ⚠️
src/bootstrap/initialize_relayers.rs	56.59%	247 Missing ⚠️
src/bootstrap/config_processor.rs	13.45%	238 Missing ⚠️
src/repositories/relayer/mod.rs	78.78%	7 Missing ⚠️
src/utils/polling.rs	99.13%	1 Missing ⚠️

Additional details and impacted files

Flag	Coverage Δ
ai	`0.27% <0.91%> (+<0.01%)`	⬆️
dev	`90.26% <32.50%> (-0.71%)`	⬇️
properties	`0.01% <0.00%> (-0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@            Coverage Diff             @@
##             main     #622      +/-   ##
==========================================
- Coverage   90.97%   90.27%   -0.71%     
==========================================
  Files         288      289       +1     
  Lines      118548   120108    +1560     
==========================================
+ Hits       107852   108428     +576     
- Misses      10696    11680     +984

Files with missing lines	Coverage Δ
src/repositories/relayer/relayer_in_memory.rs	`82.47% <ø> (ø)`
src/utils/polling.rs	`99.13% <99.13%> (ø)`
src/repositories/relayer/mod.rs	`81.65% <78.78%> (-1.25%)`	⬇️
src/bootstrap/config_processor.rs	`82.95% <13.45%> (-15.60%)`	⬇️
src/bootstrap/initialize_relayers.rs	`66.21% <56.59%> (-4.91%)`	⬇️
src/utils/redis.rs	`13.60% <0.00%> (-15.68%)`	⬇️

... and 8 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In `@src/bootstrap/config_processor.rs`:
- Around line 529-537: The current wait_for_config_processing_complete call
ignores poll_until's timeout result and always returns Ok(()), allowing startup
to continue without config; update wait_for_config_processing_complete to
propagate poll_until errors (or convert timeout into a Err) instead of
unconditionally returning Ok so the process fails fast when is_redis_populated
polling times out; locate the poll_until invocation in
wait_for_config_processing_complete and return the poll_until result (or map its
timeout into a meaningful error) so the caller cannot proceed with empty
repositories.

In `@src/utils/time.rs`:
- Around line 24-28: The docstring for poll_until is incorrect: it claims the
function can return Err from the check closure, but poll_until logs errors from
the check closure and continues polling, only returning Ok(true) if condition
met or Ok(false) on timeout; update the documentation of poll_until to remove
the `Err(_)` return variant, explicitly state that errors from the `check`
closure are logged and ignored (do not stop polling), and clearly document the
actual return values (Ok(true) when condition met, Ok(false) on timeout).

🧹 Nitpick comments (3)

src/bootstrap/initialize_relayers.rs (3)
169-196: Consider lock expiration during long-running initialization.

If initialization takes longer than BOOTSTRAP_LOCK_TTL_SECS, the lock could expire while initialization is still in progress, allowing another instance to start initializing concurrently. This is a trade-off: a longer TTL risks lock starvation if the holder crashes, while a shorter TTL risks concurrent initialization.

The current approach with graceful degradation (proceeding on lock errors) provides resilience, but you may want to document this behavior or consider implementing lock renewal for very large deployments.

257-267: Consider adding concurrency limits for large deployments.

Using join_all runs all relayer initializations concurrently without bounds. For deployments with many relayers, this could overwhelm Redis connections or external services.

Consider using futures::stream::iter(...).buffer_unordered(n) to limit concurrent initializations:
♻️ Optional refactor with bounded concurrency
+use futures::stream::{self, StreamExt};
+
+const MAX_CONCURRENT_INITS: usize = 10;
+
 async fn run_initialization_batch<...>(...) -> Result<()> {
-    let futures = relayers.iter().map(|relayer| {
+    let results: Vec<_> = stream::iter(relayers.iter().map(|relayer| {
         let app_state = app_state.clone();
         let relayer_id = relayer.id.clone();
 
         async move {
             let result = initialize_relayer(relayer_id.clone(), app_state).await;
             (relayer_id, result)
         }
-    });
-
-    let results = futures::future::join_all(futures).await;
+    }))
+    .buffer_unordered(MAX_CONCURRENT_INITS)
+    .collect()
+    .await;
672-683: Consider using environment variable for test Redis URL.

The Redis URL is hardcoded to 127.0.0.1:6379. For CI/CD environments or developers using different Redis configurations, consider using an environment variable with a fallback:
♻️ Suggested improvement
 async fn create_test_redis_pool() -> Option<Arc<Pool>> {
-    let cfg = deadpool_redis::Config::from_url("redis://127.0.0.1:6379");
+    let redis_url = std::env::var("TEST_REDIS_URL")
+        .unwrap_or_else(|_| "redis://127.0.0.1:6379".to_string());
+    let cfg = deadpool_redis::Config::from_url(&redis_url);

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/bootstrap/config_processor.rs`:
- Around line 507-545: The current wait_for_config_processing_complete uses
is_redis_populated to decide completion but that ignores API keys so minimal
configs can time out; update the completion check in
wait_for_config_processing_complete (or extend is_redis_populated) to also query
the ApiKeyRepositoryTrait (AKR) from the provided app_state to detect whether
any API keys exist (e.g., call a count/list method on the API key repo) and
treat presence of at least one API key as “populated”/complete; keep the
existing polling and timeout behaviour but return completed=true when API keys
are present to avoid false timeouts for empty relayer/signer/network/plugin
sets.

🧹 Nitpick comments (3)

src/bootstrap/initialize_relayers.rs (3)
222-239: Consider adding debug logging when wait completes successfully.

The function discards the boolean result from poll_until, which means successful completion and timeout both proceed silently at this level. While poll_until logs warnings on timeout, adding a debug log on success would improve observability.
♻️ Suggested improvement
-    poll_until(
+    let completed = poll_until(
         || is_global_init_recently_completed(&conn, &prefix, INIT_STALENESS_THRESHOLD_SECS),
         max_wait,
         poll_interval,
         "initialization",
     )
     .await?;
 
+    if completed {
+        debug!("Another instance completed initialization, proceeding");
+    }
+
     Ok(())
257-267: Consider bounded concurrency for large relayer counts.

Using join_all spawns all initialization tasks concurrently without limit. For deployments with many relayers, this could overwhelm connection pools or external services. Consider using buffer_unordered with a reasonable limit if large-scale deployments are expected.
♻️ Suggested approach (if needed in future)
use futures::stream::{self, StreamExt};

const MAX_CONCURRENT_INIT: usize = 10;

let results: Vec<_> = stream::iter(futures)
    .buffer_unordered(MAX_CONCURRENT_INIT)
    .collect()
    .await;
672-683: Consider using an environment variable for Redis URL.

The hardcoded redis://127.0.0.1:6379 works for local testing but could be made configurable via environment variable for flexibility in different test environments.
♻️ Suggested improvement
 async fn create_test_redis_pool() -> Option<Arc<Pool>> {
-    let cfg = deadpool_redis::Config::from_url("redis://127.0.0.1:6379");
+    let url = std::env::var("TEST_REDIS_URL").unwrap_or_else(|_| "redis://127.0.0.1:6379".to_string());
+    let cfg = deadpool_redis::Config::from_url(&url);

tirumerla

lgtm, thanks..Added one comment

zeljkoX

Just tested it locally with 2 instances. Looks good.

Sai made a good comment. Let's see how to address it.

zeljkoX · 2026-02-23T14:03:21Z

Hey @NicoMolinaOZ

I have merged latest main and added changes to reuse new env var DISTRIBUTED_MODE so sync logic is only used when flag is set.

Same approach is used at other places where locks are used.

fix: Adding lock to init relayer instances

b21f378

github-actions Bot added the cla: allowlist label Jan 20, 2026

NicoMolinaOZ marked this pull request as ready for review January 20, 2026 20:21

NicoMolinaOZ requested a review from a team as a code owner January 20, 2026 20:21

NicoMolinaOZ added 5 commits January 20, 2026 17:50

fix: Fixing test

99ad943

test: Adding test cases

7c42e77

test: Adding coverage

3fcc1d1

test: Adding coverage

c7b6d64

test: Adding test cases

9282cd1

zeljkoX reviewed Jan 22, 2026

View reviewed changes

Comment thread src/bootstrap/initialize_relayers.rs Outdated

NicoMolinaOZ added 3 commits January 26, 2026 09:53

feat: Global lock for init and config file processing

20688ca

chore: Merge main

f4d1b4a

fix: Improvements

96533ca

coderabbitai Bot reviewed Jan 28, 2026

View reviewed changes

Comment thread src/bootstrap/config_processor.rs Outdated

Comment thread src/utils/time.rs Outdated

fix: Suggestions

75d191f

coderabbitai Bot reviewed Jan 28, 2026

View reviewed changes

Comment thread src/bootstrap/config_processor.rs

fix: Suggestions

930e96c

tirumerla reviewed Feb 4, 2026

View reviewed changes

Comment thread src/bootstrap/initialize_relayers.rs Outdated

zeljkoX approved these changes Feb 4, 2026

View reviewed changes

Comment thread src/utils/time.rs Outdated

Comment thread src/bootstrap/initialize_relayers.rs Outdated

zeljkoX added 2 commits February 20, 2026 15:52

Merge branch 'main' into improvements-to-relayers-initialization

4e7e4e1

chore: Use distributed mode flag

38ad5d1

NicoMolinaOZ added 5 commits February 25, 2026 08:45

fix: Improvements

6dcd180

fix: Fixing redis config validation for dist mode

2fe203c

feat: Added logic to recovery after timeout

50df813

chore: Merge main

ca85c57

test: Test coverage

b60fc1c

dylankilkenny approved these changes Mar 12, 2026

View reviewed changes

tirumerla added 2 commits March 12, 2026 21:17

fix: Config processing path locks

bb9f3a9

chore: Merge branch 'main' into improvements-to-relayers-initialization

b1696cf

github-advanced-security AI found potential problems Mar 13, 2026

View reviewed changes

Comment thread src/bootstrap/config_processor.rs Dismissed

fix: Change wording to avoid codeQL false positive

46959b9

collins-w merged commit 51df3c5 into main Mar 13, 2026
24 of 26 checks passed

collins-w deleted the improvements-to-relayers-initialization branch March 13, 2026 12:03

github-actions Bot locked and limited conversation to collaborators Mar 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Adding lock to init relayer instances#622

fix: Adding lock to init relayer instances#622
collins-w merged 21 commits intomainfrom
improvements-to-relayers-initialization

NicoMolinaOZ commented Jan 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jan 20, 2026 •

edited

Loading

Review skipped

Uh oh!

codecov Bot commented Jan 20, 2026 •

edited

Loading

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

tirumerla left a comment

Uh oh!

Uh oh!

zeljkoX left a comment

Uh oh!

Uh oh!

Uh oh!

zeljkoX commented Feb 23, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

NicoMolinaOZ commented Jan 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing Process

Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

codecov Bot commented Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tirumerla left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zeljkoX left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

zeljkoX commented Feb 23, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

NicoMolinaOZ commented Jan 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jan 20, 2026 •

edited

Loading

codecov Bot commented Jan 20, 2026 •

edited

Loading