zebra: Allow connected routes NHG sent to dplane#21125
Conversation
Greptile SummaryThis PR fixes a bug where connected routes' nexthop groups (NHGs) were not being sent to the dataplane when The fix is achieved through two consistent, minimal changes:
The two changes are logically paired and correct. The only minor issue is a stale comment in Confidence Score: 4/5
Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[rib_add_multipath called] --> B{Route type?}
B -- "ZEBRA_ROUTE_KERNEL\nor ZEBRA_ROUTE_LOCAL" --> C[SET NEXTHOP_GROUP_INITIAL_DELAY_INSTALL]
B -- "ZEBRA_ROUTE_CONNECT\n(after fix)" --> D[Flag NOT set]
B -- "Other routes\ne.g. BGP/OSPF" --> D
C --> E[zebra_nhg_install_kernel\ntype = KERNEL or LOCAL]
D --> F[zebra_nhg_install_kernel\ntype = CONNECT or other]
E --> G{type == LOCAL\nor KERNEL?}
G -- "Yes" --> H[Flag preserved\nDelay optimization kept]
G -- "No\ne.g. ZEBRA_ROUTE_MAX" --> I[Flag cleared\nNHG marked uninstalled]
F --> J{type != LOCAL\nand != KERNEL?}
J -- "Yes (CONNECT,\nROUTE_MAX, etc.)" --> K{Flag set?}
K -- "No (normal)" --> L[Proceed to install check]
K -- "Yes (edge case)" --> I
J -- "No" --> H
H --> M[dplane_nexthop_add\nFlag set → fake SUCCESS\nNHG marked INSTALLED\nNo dplane op sent]
L --> N{NHG VALID\nand not INSTALLED?}
I --> N
N -- "Yes" --> O[dplane_nexthop_add\nReal op enqueued\nNHG marked QUEUED]
N -- "No" --> P[Skip install]
O --> Q[FPM/Kernel receives NHG]
M --> R[NHG skipped in FPM]
|
e978a11 to
7695923
Compare
|
|
||
| if ((type != ZEBRA_ROUTE_CONNECT && type != ZEBRA_ROUTE_LOCAL && | ||
| type != ZEBRA_ROUTE_KERNEL) && | ||
| if ((type != ZEBRA_ROUTE_LOCAL && type != ZEBRA_ROUTE_KERNEL) && |
There was a problem hiding this comment.
This is a straight up NAK from me. This is not an appropriate approach to the problem. It completely recreates the problem that was being fixed here. I spoke w/ Eddie about a completely different approach and this was not taken
|
At the very least a topotest that shows that we do not install the nhg to the kernel and at the same time it is forwarded to the fpm is needed here. |
7695923 to
9c4a370
Compare
|
Thanks @donaldsharp , really appreciate your feedback. |
|
I'm on vacation this week and will be able to look again next week |
85dbec2 to
fb03ca1
Compare
| if (IS_ZEBRA_DEBUG_NHG_DETAIL) | ||
| zlog_debug("%s: NHG %pNG delayed-install optimization (flags 0x%x)", | ||
| __func__, nhe, nhe->flags); | ||
| } else { |
There was a problem hiding this comment.
I think this should still be there. Why are you removing it?
There was a problem hiding this comment.
Thanks, added back.
| * Clear QUEUED flag for non-system routes to allow re-installation | ||
| * when the NHG is shared between multiple protocols. | ||
| */ | ||
| if (type != ZEBRA_ROUTE_CONNECT && type != ZEBRA_ROUTE_LOCAL && type != ZEBRA_ROUTE_KERNEL) |
|
My alternate proposal is #21893 |
When "fpm use-next-hop-groups" is enabled, NHGs for system routes (connected, local, kernel) carry the NEXTHOP_GROUP_INITIAL_DELAY_INSTALL flag, which was originally designed to prevent them from being programmed into kernel. However, this causes routes that resolve their nexthop through a connected route to fail insertion in the FPM. Fix this by enqueuing INITIAL_DELAY NHGs into dplane as normal, but use dplane_ctx_set_skip_kernel() for these NHGs. This allows FPM to receive and program them while skipping kernel programming. Signed-off-by: Yuqing Zhao <galadriel.zyq@alibaba-inc.com>
…t skipping kernel Signed-off-by: Yuqing Zhao <galadriel.zyq@alibaba-inc.com>
fb03ca1 to
ff7f6d0
Compare
When "fpm use-next-hop-groups" is enabled, connected routes' NHGs are not being sent to dplane.
This causes routes that resolve their nexthop through a connected route to fail insertion in the FPM.
Fix this by restricting the NEXTHOP_GROUP_INITIAL_DELAY_INSTALL flag to KERNEL and LOCAL routes only,
allowing connected routes' NHG to be properly downloaded to the dplane.