bgpd: speed up peer teardown with per-peer route indexes#21186
bgpd: speed up peer teardown with per-peer route indexes#21186nick-bouliane wants to merge 1 commit intoFRRouting:masterfrom
Conversation
Track adj-in and path_info entries per peer so clear operations avoid full RIB/tree scans during neighbor teardown. Deduplicate queued bgp_dest work items so AddPath peers do not enqueue the same destination repeatedly. Signed-off-by: Nick Bouliane <nbouliane@coreweave.com>
Greptile SummaryThis PR adds per-peer indexes (linked lists) to track
Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[bgp_clear_route] --> B[bgp_clear_route_table]
B --> C[Walk peer->adj_in_head list]
C --> D{afi/safi match?}
D -- Yes --> E[bgp_adj_in_remove]
D -- No --> C
E --> C
B --> F[Walk peer->paths list via LIST_FOREACH_SAFE]
F --> G{afi/safi match?}
G -- No --> F
G -- Yes --> H{force?}
H -- Yes --> I[bgp_path_info_reap]
I --> F
H -- No --> J{dest in queued_dests hash?}
J -- Yes/skip --> F
J -- No --> K[Add to queued_dests hash]
K --> L[Queue dest to clear_node_queue]
L --> F
L -.-> M[bgp_clear_route_node worker]
M --> N{BGP_PATH_REMOVED?}
N -- Yes/skip --> M
N -- No --> O[bgp_rib_remove / mark stale]
Last reviewed commit: f199de7 |
mjstapp
left a comment
There was a problem hiding this comment.
we have a batching mechanism that should be helping when multiple peer connections are closing. that mechanism also does yield/resume, so it tries not to occupy the main pthread. can you say more about why that mechanism isn't being used in your situation? I'd like to see if we can bring more peer processing into that path if we need to.
Good question. We are using the existing clear work-queue path (clear_node_queue / bgp_clear_route_node) already. |
| /* workqueues */ | ||
| struct work_queue *clear_node_queue; | ||
|
|
||
| /* Per-peer path list for fast teardown (avoids full table walk) */ |
There was a problem hiding this comment.
Why can't we use the bnc and it's list of paths to do this work, instead of adding an entirely new structure ( and it's associated cost of higher memory usage ) to do this work?
There was a problem hiding this comment.
Technically BNC could be part of the solution, but for peer clear it is a nexthop-oriented index, not a peer-ownership index. To use it for this workflow we would still need to:
1- iterate all BNC entries (or add/maintain a peer->BNC mapping),
2- iterate each bnc->paths,
3- filter pi->peer == target_peer,
4- dedupe pi->net (bgp_dest) before queue/remove, and
5- run separate adj-in cleanup anyway (BNC does not index adj-in).
The per-peer lists are more direct for this specific operation: they answer “what belongs to this peer?” in bounded time and cover both path and adj-in cleanup.
There was a problem hiding this comment.
Rough incremental memory estimate (64-bit): about +40 MB for 1 peer with 1M routes, and about +4.2 MB for 10k peers with 10 routes each.
|
I've got some experimental code that uses the bnc for speed up. I will need a chance to get it working. Hopefully in the next day or so, so that we can compare the two solutions. |
@donaldsharp did you have time play with this one ? Also if you have your code in a repo I can have a look and play with it. Thanks ! |
Track adj-in and path_info entries per peer so clear operations avoid full RIB/tree scans during neighbor teardown. Deduplicate queued bgp_dest work items so AddPath peers do not enqueue the same destination repeatedly.
Signed-off-by: Nick Bouliane nbouliane@coreweave.com