Skip to content

apollo_network_benchmark: add message index detection mechanism#11557

Closed
sirandreww-starkware wants to merge 1 commit into01-08-apollo_network_benchmark_add_messageindextracker_structfrom
01-08-apollo_network_benchmark_add_message_index_detection_mechanism
Closed

apollo_network_benchmark: add message index detection mechanism#11557
sirandreww-starkware wants to merge 1 commit into01-08-apollo_network_benchmark_add_messageindextracker_structfrom
01-08-apollo_network_benchmark_add_message_index_detection_mechanism

Conversation

@sirandreww-starkware
Copy link
Copy Markdown
Contributor

@sirandreww-starkware sirandreww-starkware commented Jan 8, 2026

Note

Medium Risk
Introduces new concurrent task wiring and unbounded channel usage in the hot receive path; misuse of peer indexing/unwrap() could cause panics or inaccurate metrics under load.

Overview
Adds an async message-index tracking path to the broadcast stress test receiver: received messages now forward (sender_id, message_index) over an unbounded channel to a new record_indexed_message task which maintains per-peer MessageIndexTrackers and updates a new RECEIVE_MESSAGE_PENDING_COUNT gauge.

Wires this into BroadcastNetworkStressTestNode by splitting the receiver into two tasks (receiver + tracker) and removes the dead_code allowance from message_index_detector.rs now that it’s used.

Written by Cursor Bugbot for commit a85e9be. This will update automatically on new commits. Configure here.

@reviewable-StarkWare
Copy link
Copy Markdown

This change is Reviewable

Copy link
Copy Markdown
Contributor Author

sirandreww-starkware commented Jan 8, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Feb 8, 2026

There hasn't been any activity on this pull request recently, and in order to prioritize active work, it has been marked as stale.
This PR will be closed and locked in 7 days if no further activity occurs.
Thank you for your contributions!

@github-actions github-actions Bot added the stale label Feb 8, 2026
@github-actions github-actions Bot closed this Feb 16, 2026
let mut index_tracker = vec![MessageIndexTracker::default(); num_peers];
let mut all_pending = 0;
while let Some((peer_id, index)) = rx.recv().await {
let old_pending = index_tracker[peer_id].pending_messages_count();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sender ID indexing can panic receiver

High Severity

record_indexed_message indexes index_tracker by sender_id, but the vector is sized with bootstrap.len(). This assumes sender IDs are dense zero-based indices, which NodeArgs.runner.id does not enforce. Valid deployments with sparse or non-zero-based IDs can trigger out-of-bounds access and crash the receiver path.

Additional Locations (1)

Fix in Cursor Fix in Web

}
.boxed()
let (tx, rx) = tokio::sync::mpsc::unbounded_channel();
let num_peers = self.args.runner.bootstrap.len();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index tracker sized by bootstrap peers, indexed by sender_id

High Severity

num_peers is set to self.args.runner.bootstrap.len(), which is the number of other peers (N−1 for N nodes). But record_indexed_message uses this to size index_tracker and indexes it by sender_id (which is runner.id, ranging from 0 to N−1). For the node with the highest ID, index_tracker[N-1] is out of bounds on a vec of length N−1, causing a panic at runtime.

Additional Locations (1)

Fix in Cursor Fix in Web

@sirandreww-starkware sirandreww-starkware force-pushed the 01-08-apollo_network_benchmark_add_message_index_detection_mechanism branch from f10f148 to 6bed8fb Compare February 19, 2026 08:04
@sirandreww-starkware sirandreww-starkware force-pushed the 01-08-apollo_network_benchmark_add_messageindextracker_struct branch from b05e636 to 41a1832 Compare February 19, 2026 08:04
@sirandreww-starkware sirandreww-starkware force-pushed the 01-08-apollo_network_benchmark_add_messageindextracker_struct branch from 41a1832 to 3093d1d Compare March 16, 2026 15:13
@sirandreww-starkware sirandreww-starkware force-pushed the 01-08-apollo_network_benchmark_add_message_index_detection_mechanism branch from 6bed8fb to 2186fdd Compare March 16, 2026 15:13
@sirandreww-starkware sirandreww-starkware force-pushed the 01-08-apollo_network_benchmark_add_messageindextracker_struct branch from 3093d1d to e8ac9a4 Compare March 22, 2026 17:09
@sirandreww-starkware sirandreww-starkware force-pushed the 01-08-apollo_network_benchmark_add_message_index_detection_mechanism branch from 2186fdd to 7672cb4 Compare March 22, 2026 17:09
@sirandreww-starkware sirandreww-starkware force-pushed the 01-08-apollo_network_benchmark_add_messageindextracker_struct branch from e8ac9a4 to b32f987 Compare March 30, 2026 11:08
@sirandreww-starkware sirandreww-starkware force-pushed the 01-08-apollo_network_benchmark_add_message_index_detection_mechanism branch from 7672cb4 to a85e9be Compare March 30, 2026 11:08
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

all_pending -= old_pending;
all_pending += new_pending;

RECEIVE_MESSAGE_PENDING_COUNT.set(all_pending.into_f64());
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pending count underflows on duplicate message indices

Medium Severity

MessageIndexTracker::pending_messages_count() computes max - min + 1 - seen_messages_count. Since seen_message() always increments seen_messages_count without checking for duplicates, receiving the same message_index twice makes seen_messages_count exceed the actual index range, causing a u64 subtraction underflow (panic in debug, wrap in release). The new record_indexed_message function is the first caller of this code path, activating this latent issue.

Additional Locations (1)
Fix in Cursor Fix in Web

@github-actions github-actions Bot locked and limited conversation to collaborators Apr 28, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants