Skip to content

bgpd: limit GR-stale NHT-unreach delete to GR helper context#21942

Open
karthikeyav wants to merge 1 commit into
FRRouting:masterfrom
karthikeyav:bgpd-gate-gr-stale-delete-on-nsf-wait
Open

bgpd: limit GR-stale NHT-unreach delete to GR helper context#21942
karthikeyav wants to merge 1 commit into
FRRouting:masterfrom
karthikeyav:bgpd-gate-gr-stale-delete-on-nsf-wait

Conversation

@karthikeyav
Copy link
Copy Markdown

PR #21742 added deletion of stale paths in evaluate_paths() when the nexthop becomes unreachable, gated on BGP_PATH_STALE. However, BGP_PATH_STALE is also set during Enhanced Route Refresh (RFC 7313) in bgp_clear_route_node(), clearing_clear_one_pi(), and bgp_set_stale_route(), so the existing gate could in principle act on stales set via the enhanced-refresh path.

Mirror the GR-specific half of the setter condition (PEER_STATUS_NSF_WAIT && peer->nsf[afi][safi]) on the deletion side so the new code only fires when acting as GR helper for this AFI/SAFI, not on stales set by enhanced route refresh.

PR FRRouting#21742 added deletion of stale paths in evaluate_paths() when the
nexthop becomes unreachable, gated on BGP_PATH_STALE. However,
BGP_PATH_STALE is also set during Enhanced Route Refresh (RFC 7313)
in bgp_clear_route_node(), clearing_clear_one_pi(), and
bgp_set_stale_route(), so the existing gate could in principle act on
stales set via the enhanced-refresh path.

Mirror the GR-specific half of the setter condition
(PEER_STATUS_NSF_WAIT && peer->nsf[afi][safi]) on the deletion side
so the new code only fires when acting as GR helper for this AFI/SAFI,
not on stales set by enhanced route refresh.

Signed-off-by: Karthikeya Venkat Muppalla <kmuppalla@nvidia.com>
@karthikeyav
Copy link
Copy Markdown
Author

@ton31337 This PR is follow up of your comment on #21742 (comment) Can you please take a look?

@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 13, 2026

Greptile Summary

This PR narrows the stale-path deletion gate in evaluate_paths() so it only fires when acting as a GR helper, preventing Enhanced Route Refresh (RFC 7313) stale routes from being prematurely deleted when a nexthop becomes unreachable.

  • Root cause fix: BGP_PATH_STALE is shared between GR and Enhanced Route Refresh codepaths; without the extra gate the NHT-triggered delete could have incorrectly removed ERR-stale paths while the peer session was still active.
  • Matching conditions: The two new predicates (PEER_STATUS_NSF_WAIT and peer->nsf[afi][safi]) exactly mirror the GR half of both setter sites in bgp_route.c (lines 7108 and 7368), making the deletion symmetric with the marking logic.
  • Debug log updated: The log message now explicitly says (GR helper), making it easier to distinguish the deletion reason during troubleshooting.

Confidence Score: 5/5

Safe to merge. The change is a one-file, three-line guard addition that tightens an existing deletion gate without touching any control paths outside the nexthop-unreachable branch.

The two new predicates (PEER_STATUS_NSF_WAIT and peer->nsf[afi][safi]) are the exact GR-specific half of the compound condition used by both setter sites in bgp_route.c. The LLGR and GR timer expiry paths both clear stale routes before unsetting PEER_STATUS_NSF_WAIT, so there is no window where a genuinely GR-stale route with an unreachable nexthop would be silently skipped by the new check. Enhanced Route Refresh stales are correctly excluded because PEER_STATUS_NSF_WAIT is not set during that flow.

No files require special attention. bgpd/bgp_nht.c is the only changed file and the modification is self-contained.

Important Files Changed

Filename Overview
bgpd/bgp_nht.c Adds PEER_STATUS_NSF_WAIT and peer->nsf[afi][safi] guards to the stale-path deletion block in evaluate_paths(); the two new conditions exactly mirror the GR-specific half of the BGP_PATH_STALE setter sites in bgp_route.c, and the debug log is updated to match.

Sequence Diagram

sequenceDiagram
    participant Peer as BGP Peer
    participant FSM as BGP FSM
    participant Route as bgp_route.c
    participant NHT as bgp_nht.c (evaluate_paths)
    participant Zebra as Zebra/NHT

    note over Peer,Zebra: GR Helper Path (PR target scenario)
    Peer->>FSM: Session down (GR notification)
    FSM->>Peer: SET PEER_STATUS_NSF_WAIT
    FSM->>Peer: SET nsf[afi][safi]
    Route->>Route: bgp_clear_route_node() marks paths BGP_PATH_STALE
    Zebra->>NHT: Nexthop becomes unreachable
    NHT->>NHT: CHECK BGP_PATH_STALE ✓
    NHT->>NHT: CHECK PEER_STATUS_NSF_WAIT ✓
    NHT->>NHT: CHECK nsf[afi][safi] ✓
    NHT->>Route: bgp_rib_remove() — stale path deleted immediately

    note over Peer,Zebra: Enhanced Route Refresh Path (should NOT delete)
    Peer->>Route: Route Refresh request
    Route->>Route: SET PEER_STATUS_ENHANCED_REFRESH
    Route->>Route: marks paths BGP_PATH_STALE
    Zebra->>NHT: Nexthop becomes unreachable
    NHT->>NHT: CHECK BGP_PATH_STALE ✓
    NHT->>NHT: CHECK PEER_STATUS_NSF_WAIT ✗ (not set)
    NHT->>NHT: condition fails — path NOT deleted ✓
Loading

Reviews (1): Last reviewed commit: "bgpd: limit GR-stale NHT-unreach delete ..." | Re-trigger Greptile

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants