bgpd: speed up peer teardown with per-peer route indexes by nick-bouliane · Pull Request #21186 · FRRouting/frr

nick-bouliane · 2026-03-16T03:15:33Z

Track adj-in and path_info entries per peer so clear operations avoid full RIB/tree scans during neighbor teardown. Deduplicate queued bgp_dest work items so AddPath peers do not enqueue the same destination repeatedly.

Signed-off-by: Nick Bouliane nbouliane@coreweave.com

Track adj-in and path_info entries per peer so clear operations avoid full RIB/tree scans during neighbor teardown. Deduplicate queued bgp_dest work items so AddPath peers do not enqueue the same destination repeatedly. Signed-off-by: Nick Bouliane <nbouliane@coreweave.com>

greptile-apps · 2026-03-16T03:20:35Z

Greptile Summary

This PR adds per-peer indexes (linked lists) to track bgp_adj_in and bgp_path_info entries, enabling O(peer_routes) instead of O(table_size) clearing during neighbor teardown. It also deduplicates work queue items for AddPath peers using a hash table, and skips already-removed paths in the clear worker.

bgp_adj_in gains a per-peer doubly-linked list (peer_adj_prev/peer_adj_next) and a dest back-pointer; bgp_path_info gains a LIST_ENTRY for per-peer tracking via peer->paths
bgp_clear_route_table is rewritten to walk per-peer lists instead of scanning the full RIB, eliminating the need for special VPN/EVPN/ENCAP nested table iteration in bgp_clear_route
A hash is used to deduplicate destination queue entries when multiple AddPath routes share the same destination
bgp_clear_route_node gains a BGP_PATH_REMOVED check to avoid double-processing already-deleted paths
Memory cost: two pointers per bgp_adj_in and one LIST_ENTRY per bgp_path_info, plus one pointer per peer — proportional to existing route count

Confidence Score: 4/5

This PR is well-structured and the core logic is sound, with properly paired list add/remove operations and correct safe-iteration patterns.
Score of 4 reflects that the algorithmic change is correct and well-reasoned: all per-peer list insertions and removals are properly paired across add/reap paths, the safe iteration patterns are correctly used, VPN/EVPN sub-table handling works because bgp_dest_table() returns the correct sub-table with matching afi/safi, and the batch clearing path (walk_batch_table_helper) also correctly maintains the per-peer lists through bgp_adj_in_remove and bgp_path_info_reap. The dedup hash and BGP_PATH_REMOVED skip are good defensive additions. Deducting 1 point because this is a significant data structure change in critical teardown paths that would benefit from thorough integration testing with AddPath, VPN, EVPN, and graceful restart scenarios.
bgpd/bgp_route.c deserves the most attention as it contains the core algorithmic change to bgp_clear_route_table and the new dedup hash logic.

Important Files Changed

Filename	Overview
bgpd/bgp_advertise.h	Adds per-peer doubly-linked list pointers and a back-pointer to dest in bgp_adj_in. Straightforward struct extension, no issues.
bgpd/bgp_advertise.c	Links/unlinks adj-in entries into per-peer list on set/remove. Doubly-linked list operations are correctly implemented with proper NULL checks.
bgpd/bgp_route.h	Adds LIST_ENTRY(bgp_path_info) peer_thread to bgp_path_info for per-peer path list linkage. Clean struct extension.
bgpd/bgp_route.c	Core logic change: bgp_clear_route_table now walks per-peer lists instead of full table. Adds dedup hash for AddPath destinations. Adds BGP_PATH_REMOVED skip in worker. Eliminates VPN/EVPN nested table iteration by leveraging per-peer indexing. Logic is sound but the hash key function is simplistic.
bgpd/bgpd.h	Adds per-peer path list head (LIST_HEAD) and adj_in_head pointer to struct peer. Clean additions.
bgpd/bgpd.c	Initializes per-peer path list in peer_new. Single-line change, correct placement.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[bgp_clear_route] --> B[bgp_clear_route_table]
    B --> C[Walk peer->adj_in_head list]
    C --> D{afi/safi match?}
    D -- Yes --> E[bgp_adj_in_remove]
    D -- No --> C
    E --> C
    B --> F[Walk peer->paths list via LIST_FOREACH_SAFE]
    F --> G{afi/safi match?}
    G -- No --> F
    G -- Yes --> H{force?}
    H -- Yes --> I[bgp_path_info_reap]
    I --> F
    H -- No --> J{dest in queued_dests hash?}
    J -- Yes/skip --> F
    J -- No --> K[Add to queued_dests hash]
    K --> L[Queue dest to clear_node_queue]
    L --> F
    L -.-> M[bgp_clear_route_node worker]
    M --> N{BGP_PATH_REMOVED?}
    N -- Yes/skip --> M
    N -- No --> O[bgp_rib_remove / mark stale]

_{Last reviewed commit: f199de7}

mjstapp

we have a batching mechanism that should be helping when multiple peer connections are closing. that mechanism also does yield/resume, so it tries not to occupy the main pthread. can you say more about why that mechanism isn't being used in your situation? I'd like to see if we can bring more peer processing into that path if we need to.

nick-bouliane · 2026-03-16T16:19:16Z

we have a batching mechanism that should be helping when multiple peer connections are closing. that mechanism also does yield/resume, so it tries not to occupy the main pthread. can you say more about why that mechanism isn't being used in your situation? I'd like to see if we can bring more peer processing into that path if we need to.

Good question. We are using the existing clear work-queue path (clear_node_queue / bgp_clear_route_node) already.
The bottleneck I measured was mainly in the synchronous pre-queue discovery phase (bgp_clear_route*) where each clearing peer walked large table structures to find relevant routes. That work happens on the main thread before items are batched/yielded.
The patch focuses on that part: it replaces full scans with per-peer indexed iteration and deduped bgp_dest queueing, so the existing yield/resume worker has much less to do and main-thread stalls drop significantly (in my stress test, multi-second vtysh stalls became near-instant).

donaldsharp · 2026-03-16T22:59:44Z

 	/* workqueues */
 	struct work_queue *clear_node_queue;

+	/* Per-peer path list for fast teardown (avoids full table walk) */


Why can't we use the bnc and it's list of paths to do this work, instead of adding an entirely new structure ( and it's associated cost of higher memory usage ) to do this work?

Technically BNC could be part of the solution, but for peer clear it is a nexthop-oriented index, not a peer-ownership index. To use it for this workflow we would still need to:

1- iterate all BNC entries (or add/maintain a peer->BNC mapping),
2- iterate each bnc->paths,
3- filter pi->peer == target_peer,
4- dedupe pi->net (bgp_dest) before queue/remove, and
5- run separate adj-in cleanup anyway (BNC does not index adj-in).

The per-peer lists are more direct for this specific operation: they answer “what belongs to this peer?” in bounded time and cover both path and adj-in cleanup.

Rough incremental memory estimate (64-bit): about +40 MB for 1 peer with 1M routes, and about +4.2 MB for 10k peers with 10 routes each.

donaldsharp · 2026-03-17T13:35:38Z

I've got some experimental code that uses the bnc for speed up. I will need a chance to get it working. Hopefully in the next day or so, so that we can compare the two solutions.

nick-bouliane · 2026-03-24T16:59:20Z

I've got some experimental code that uses the bnc for speed up. I will need a chance to get it working. Hopefully in the next day or so, so that we can compare the two solutions.

@donaldsharp did you have time play with this one ? Also if you have your code in a repo I can have a look and play with it. Thanks !

frrbot bot added the bgp label Mar 16, 2026

github-actions bot added master size/L labels Mar 16, 2026

mjstapp reviewed Mar 16, 2026

View reviewed changes

donaldsharp reviewed Mar 16, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bgpd: speed up peer teardown with per-peer route indexes#21186

bgpd: speed up peer teardown with per-peer route indexes#21186
nick-bouliane wants to merge 1 commit intoFRRouting:masterfrom
nick-bouliane:master

nick-bouliane commented Mar 16, 2026

Uh oh!

greptile-apps bot commented Mar 16, 2026

Uh oh!

mjstapp left a comment

Uh oh!

nick-bouliane commented Mar 16, 2026

Uh oh!

donaldsharp Mar 16, 2026

Uh oh!

nick-bouliane Mar 17, 2026

Uh oh!

nick-bouliane Mar 17, 2026

Uh oh!

donaldsharp commented Mar 17, 2026

Uh oh!

nick-bouliane commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nick-bouliane commented Mar 16, 2026

Uh oh!

greptile-apps bot commented Mar 16, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

mjstapp left a comment

Choose a reason for hiding this comment

Uh oh!

nick-bouliane commented Mar 16, 2026

Uh oh!

donaldsharp Mar 16, 2026

Choose a reason for hiding this comment

Uh oh!

nick-bouliane Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

nick-bouliane Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

donaldsharp commented Mar 17, 2026

Uh oh!

nick-bouliane commented Mar 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants