apollo_network_benchmark: add message index detection mechanism#11557
Conversation
|
There hasn't been any activity on this pull request recently, and in order to prioritize active work, it has been marked as stale. |
| let mut index_tracker = vec![MessageIndexTracker::default(); num_peers]; | ||
| let mut all_pending = 0; | ||
| while let Some((peer_id, index)) = rx.recv().await { | ||
| let old_pending = index_tracker[peer_id].pending_messages_count(); |
There was a problem hiding this comment.
Sender ID indexing can panic receiver
High Severity
record_indexed_message indexes index_tracker by sender_id, but the vector is sized with bootstrap.len(). This assumes sender IDs are dense zero-based indices, which NodeArgs.runner.id does not enforce. Valid deployments with sparse or non-zero-based IDs can trigger out-of-bounds access and crash the receiver path.
Additional Locations (1)
a581d33 to
c7ca4ed
Compare
c7ca4ed to
f10f148
Compare
0794786 to
b05e636
Compare
| } | ||
| .boxed() | ||
| let (tx, rx) = tokio::sync::mpsc::unbounded_channel(); | ||
| let num_peers = self.args.runner.bootstrap.len(); |
There was a problem hiding this comment.
Index tracker sized by bootstrap peers, indexed by sender_id
High Severity
num_peers is set to self.args.runner.bootstrap.len(), which is the number of other peers (N−1 for N nodes). But record_indexed_message uses this to size index_tracker and indexes it by sender_id (which is runner.id, ranging from 0 to N−1). For the node with the highest ID, index_tracker[N-1] is out of bounds on a vec of length N−1, causing a panic at runtime.
Additional Locations (1)
f10f148 to
6bed8fb
Compare
b05e636 to
41a1832
Compare
41a1832 to
3093d1d
Compare
6bed8fb to
2186fdd
Compare
3093d1d to
e8ac9a4
Compare
2186fdd to
7672cb4
Compare
e8ac9a4 to
b32f987
Compare
7672cb4 to
a85e9be
Compare
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
There are 3 total unresolved issues (including 2 from previous reviews).
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.
| all_pending -= old_pending; | ||
| all_pending += new_pending; | ||
|
|
||
| RECEIVE_MESSAGE_PENDING_COUNT.set(all_pending.into_f64()); |
There was a problem hiding this comment.
Pending count underflows on duplicate message indices
Medium Severity
MessageIndexTracker::pending_messages_count() computes max - min + 1 - seen_messages_count. Since seen_message() always increments seen_messages_count without checking for duplicates, receiving the same message_index twice makes seen_messages_count exceed the actual index range, causing a u64 subtraction underflow (panic in debug, wrap in release). The new record_indexed_message function is the first caller of this code path, activating this latent issue.



Note
Medium Risk
Introduces new concurrent task wiring and unbounded channel usage in the hot receive path; misuse of peer indexing/
unwrap()could cause panics or inaccurate metrics under load.Overview
Adds an async message-index tracking path to the broadcast stress test receiver: received messages now forward
(sender_id, message_index)over an unbounded channel to a newrecord_indexed_messagetask which maintains per-peerMessageIndexTrackers and updates a newRECEIVE_MESSAGE_PENDING_COUNTgauge.Wires this into
BroadcastNetworkStressTestNodeby splitting the receiver into two tasks (receiver + tracker) and removes thedead_codeallowance frommessage_index_detector.rsnow that it’s used.Written by Cursor Bugbot for commit a85e9be. This will update automatically on new commits. Configure here.