When advertise-all-vni is enabled, zebra reads FDB and neighbor tables directly from the kernel via netlink, bypassing all dplane providers. This makes EVPN unusable with non-kernel dplane plugins.
There are two related problems.
1. Direct kernel reads on EVPN enable
zebra_vxlan_advertise_all_vni() performs bulk netlink dumps to populate its EVPN tables. These send RTM_GETNEIGH dumps directly to the kernel netlink socket. A dplane provider has no way to supply its own FDB/neighbor state. With a non-kernel dataplane (e.g. DPDK), the kernel has no relevant entries, so zebra's EVPN tables remain empty.
|
/* Read the MAC FDB */ |
|
ns_walk_func(macfdb_read_ns, NULL, NULL); |
|
|
|
/* Read neighbors */ |
|
ns_walk_func(neigh_read_ns, NULL, NULL); |
The same pattern appears in other EVPN code paths:
|
macfdb_read_for_bridge(zns, ifp, zif->brslave_info.br_if, |
|
macfdb_read_mcast_entry_for_vni(zns, ifp, zevpn->vni); |
|
macfdb_read_specific_mac(zns, zif->brslave_info.br_if, |
|
neigh_read_specific_ip(ipaddr, vlan_if); |
2. Race between dplane provider sync and EVPN enablement
zebra_neigh_macfdb_update() discards all upward FDB notifications from dplane providers before EVPN is enabled:
|
/* We only process macfdb notifications if EVPN is enabled */ |
|
if (!is_evpn_enabled()) |
|
return; |
A dplane provider that connects at zebra startup and syncs its FDB state (via DPLANE_OP_NEIGH_INSTALL contexts enqueued toward zebra) will have all its notifications silently dropped if BGP hasn't yet sent ZEBRA_ADVERTISE_ALL_VNI. There is no mechanism to re-trigger the sync after EVPN is enabled.
Impact
Any non-kernel dplane provider attempting to support EVPN faces: no initial FDB/neighbor population (kernel reads return empty results), lost FDB notifications due to the is_evpn_enabled() race, and broken duplicate address detection (kernel queries return nothing).
Possible approaches
- Add a dplane callback or new dplane operation that allows providers to supply FDB/neighbor state when requested (replacing the direct
macfdb_read() / neigh_read() calls).
- When
advertise_all_vni transitions from 0 to 1, trigger dplane providers to re-sync their FDB state (e.g. a new DPLANE_OP_EVPN_SYNC_REQUEST operation or a provider hook).
- Buffer dplane FDB notifications received before EVPN is enabled and replay them once
advertise_all_vni is set.
Let me know if there are things that I missed or if there would be alternative approaches.
Cheers.
When
advertise-all-vniis enabled, zebra reads FDB and neighbor tables directly from the kernel via netlink, bypassing all dplane providers. This makes EVPN unusable with non-kernel dplane plugins.There are two related problems.
1. Direct kernel reads on EVPN enable
zebra_vxlan_advertise_all_vni()performs bulk netlink dumps to populate its EVPN tables. These sendRTM_GETNEIGHdumps directly to the kernel netlink socket. A dplane provider has no way to supply its own FDB/neighbor state. With a non-kernel dataplane (e.g. DPDK), the kernel has no relevant entries, so zebra's EVPN tables remain empty.frr/zebra/zebra_vxlan.c
Lines 6020 to 6024 in bf2a8cf
The same pattern appears in other EVPN code paths:
frr/zebra/zebra_evpn.c
Line 949 in bf2a8cf
frr/zebra/zebra_evpn.c
Line 954 in bf2a8cf
frr/zebra/zebra_evpn.c
Line 1617 in bf2a8cf
frr/zebra/zebra_evpn_neigh.c
Line 2325 in bf2a8cf
2. Race between dplane provider sync and EVPN enablement
zebra_neigh_macfdb_update()discards all upward FDB notifications from dplane providers before EVPN is enabled:frr/zebra/zebra_neigh.c
Lines 605 to 607 in bf2a8cf
A dplane provider that connects at zebra startup and syncs its FDB state (via
DPLANE_OP_NEIGH_INSTALLcontexts enqueued toward zebra) will have all its notifications silently dropped if BGP hasn't yet sentZEBRA_ADVERTISE_ALL_VNI. There is no mechanism to re-trigger the sync after EVPN is enabled.Impact
Any non-kernel dplane provider attempting to support EVPN faces: no initial FDB/neighbor population (kernel reads return empty results), lost FDB notifications due to the
is_evpn_enabled()race, and broken duplicate address detection (kernel queries return nothing).Possible approaches
macfdb_read()/neigh_read()calls).advertise_all_vnitransitions from 0 to 1, trigger dplane providers to re-sync their FDB state (e.g. a newDPLANE_OP_EVPN_SYNC_REQUESToperation or a provider hook).advertise_all_vniis set.Let me know if there are things that I missed or if there would be alternative approaches.
Cheers.