-
-
Notifications
You must be signed in to change notification settings - Fork 107
fix: Resolve race conditions and improve operation atomicity #2233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tions in subscriptions
… import in ping_client.rs
|
@claude review it |
|
Claude finished @netsirius's task —— View job Code Review Complete ✅I've reviewed the PR addressing race conditions and improving operation atomicity. Overall, this is a well-designed solution to a challenging concurrency problem. The changes demonstrate solid understanding of async Rust and distributed systems patterns. 🎯 SummaryThe PR successfully addresses random test failures by fixing race conditions between PUT and Subscribe operations through a channel-based notification system. The core approach is sound and the implementation is thorough. Key Strengths:
Issues Found:
🔴 Critical Issues1. Race condition in
|
|
@claude the improvements from the last review have been implemented, you can review it again now |
|
Claude finished @netsirius's task —— View job Re-review Complete ✅All critical issues from the previous review have been successfully addressed. The implementation shows excellent understanding of the race conditions and proper async Rust patterns. Summary of Fixes✅ Critical Issue #1: Race Condition in
|
|
Claude finished @iduartgomez's task —— View job Review: Impact of PR #2229 mergeI've analyzed the recently merged PR #2229 and its interaction with this PR #2233. Todo List
Summary: No Changes Needed ✅Good news: This PR is already based on PR #2229 (commit Detailed Analysis1. Branch RelationshipLooking at the commit history: This PR already includes PR #2229 - the branch was rebased on top of it via commit 2. Architectural CompatibilityPR #2229 introduced an important architectural change:
This PR (2233) builds correctly on that foundation: Local Subscriptions (own WebSocket clients)// crates/core/src/operations/subscribe.rs:332-356
async fn complete_local_subscription(
op_manager: &OpManager,
id: Transaction,
key: ContractKey,
) -> Result<(), OpError> {
// ✅ Correctly does NOT call add_subscriber
// ✅ Notifies via NodeEvent::LocalSubscribeComplete
// ✅ Follows PR #2229's architecture
op_manager
.notify_node_event(crate::message::NodeEvent::LocalSubscribeComplete {
tx: id,
key,
subscribed: true,
})
.await?;
op_manager.completed(id);
Ok(())
}Network Subscriptions (remote peers)// crates/core/src/operations/subscribe.rs:515-518
// When handling RequestSub from REMOTE peer:
if op_manager
.ring
.add_subscriber(key, subscriber.clone(), None) // ✅ Correct for network subs
.is_err()The distinction is correct:
3. New Functionality IntegrationThis PR adds race condition handling that complements PR #2229: wait_for_contract_with_timeout (NEW in this PR)// Handles race where Subscribe arrives before PUT completes
async fn wait_for_contract_with_timeout(
op_manager: &OpManager,
key: ContractKey,
timeout_ms: u64,
) -> Result<bool, OpError> {
// Fast path check → Register waiter → Double-check → Wait → Verify
// ...
}This works seamlessly with PR #2229's changes because:
4. Files ModifiedComparing the two PRs: PR #2229 touched:
This PR touches:
Overlap: Only Verification: No Merge Conflicts$ git status
On branch fix/2106-ensure-op-atomicity
Your branch is up to date with 'origin/fix/2106-ensure-op-atomicity'.
nothing to commit, working tree cleanThe branch is clean and based on the current main (which includes #2229). Conclusion✅ No changes needed. This PR:
Recommendation: This PR is ready to merge without modifications related to PR #2229. Additional NotesThe changes in this PR actually strengthen the architectural improvements from PR #2229 by:
The two PRs work together as a coherent solution to subscription reliability issues. |
Two separate issues: 1. test_macro_example flaky tests: The duplicate keypair detection used Display impl of TransportPublicKey, which only captured 12 bytes of DER encoding. Analysis showed: - First 6 bytes: DER headers (identical for all 2048-bit RSA keys) - Last 6 bytes: Contains exponent 65537 (identical for all keys) - Only ~1 byte of actual entropy Fix: Use full public key comparison (via Hash impl) for uniqueness Also improved Display impl to use bytes from the modulus (actual random data) 2. transport_perf benchmark clippy errors (from merge): - explicit_counter_loop: Use for loop with range instead of manual counter - while_let_loop: Convert loop+match to while let 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
…omicity # Conflicts: # crates/core/src/node/op_state_manager.rs
Summary
This PR fixes random test failures in the blocked peers test suite caused by race conditions between operations and missing subscription handling. The changes ensure proper operation atomicity and reliable state propagation across the network.
Problems Solved
1. Race Condition: PUT vs Subscribe
Problem: Subscription requests arrive at remote peers before the contract has propagated via PUT, causing
NoCachingPeerserrors and random test failures.Solution: Channel-based notification system in
OpManager:wait_for_contract(key)registers a waiternotify_contract_stored(key)notifies when PUT completestokio::select!for notification or timeout2. Missing Subscription Listeners in GET Paths
Problem: GET operations with
subscribe=trueweren't registering listeners in network and legacy code paths, so clients missed updates.Solution: Added
register_subscription_listener()helper and called it in:3. StatePushed Treated as Error
Problem:
OpError::StatePushedsignals async re-queuing but was logged as error.Solution: Handle gracefully:
client_events/mod.rs:Ok(()) | Err(OpError::StatePushed) => {}node/mod.rs: Early return without error logging4. Test Infrastructure Issues
Problem: Global RNG caused cross-test interference; concurrent contract compilation caused race conditions.
Solution:
COMPILE_LOCKmutex for contract compilationRngtrait bounds5. WebSocket Backpressure
Problem: Long operations caused WebSocket message buildup and test timeouts.
Solution:
check_intervalparameter for periodic message draining during waits.Files Changed
crates/core/src/node/op_state_manager.rscrates/core/src/operations/subscribe.rscrates/core/src/client_events/mod.rscrates/core/src/operations/put.rsstart_subscription_requestcrates/core/src/node/mod.rsapps/freenet-ping/app/tests/common/mod.rsapps/freenet-ping/app/tests/run_app_blocked_peers.rsapps/freenet-ping/**/Cargo.tomlTesting
Closes #2106