Skip to content

perf(graph): replace 50ms sleep-poll in wait_for_service with condvar#193

Open
YuanYuYuan wants to merge 3 commits into
mainfrom
dev/graph-notify
Open

perf(graph): replace 50ms sleep-poll in wait_for_service with condvar#193
YuanYuYuan wants to merge 3 commits into
mainfrom
dev/graph-notify

Conversation

@YuanYuYuan
Copy link
Copy Markdown
Collaborator

Summary

  • Replace the 50ms sleep-poll in wait_for_service with a condvar notification from Graph, so callers wake the instant a matching service server appears rather than up to 50ms later.

Key Changes

  • crates/hiroz/src/graph.rs: Add change_signal: Arc<(StdMutex<u64>, Condvar)> to Graph. The liveliness subscriber callback drops graph.data lock before acquiring this mutex, increments an epoch counter, and calls notify_all(). This preserves the lock ordering (waiter: condvar mutex → graph.data; notifier: graph.data → condvar mutex) so no deadlock is possible.
  • crates/hiroz/src/ffi/service.rs: Replace std::thread::sleep(50ms) poll loop in RawServiceClient::wait_for_service with a condvar wait on graph.change_signal. The waiter holds the condvar mutex across the condition check and the wait_timeout call, so no signal fired between the check and the wait can be missed.

Scope

The async paths (wait_for_subscription, wait_for_publisher, wait_for_server) already use graph.change_notify (tokio Notify) with the correct TOCTOU-safe pattern and are unchanged. This PR only fixes the sync FFI path.

Breaking Changes

None

Add `change_signal: Arc<(StdMutex<u64>, Condvar)>` to `Graph`. The
liveliness callback drops graph.data before acquiring the condvar mutex,
increments an epoch counter, and calls notify_all() — preserving the
lock ordering (waiter: condvar → data; notifier: data → condvar) to
prevent deadlock.

Replace the sleep-poll loop in `RawServiceClient::wait_for_service`
(ffi/service.rs) with a condvar wait. The waiter holds the condvar mutex
across both the condition check and wait_timeout so no signal fired
between the two can be missed.

The async paths (wait_for_subscription, wait_for_publisher,
wait_for_server) already use tokio Notify with correct TOCTOU handling
and are unchanged.
…gnal

The epoch counter was written on every graph change but never read by
the waiter — the condvar mechanism itself guarantees no missed signals
when the waiter holds the mutex across the condition check and
wait_timeout. Remove the dead u64 and use StdMutex<()> instead.
let _ = mutex.lock() drops the guard immediately — Rust rejects this as
a non-binding let on a synchronization lock. The notify_all() call does
not require the mutex to be held on the notifier side.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant