Skip to content

zebra: EVPN FDB/neighbor state sync bypasses dplane providers #21190

@rjarry

Description

@rjarry

When advertise-all-vni is enabled, zebra reads FDB and neighbor tables directly from the kernel via netlink, bypassing all dplane providers. This makes EVPN unusable with non-kernel dplane plugins.

There are two related problems.

1. Direct kernel reads on EVPN enable

zebra_vxlan_advertise_all_vni() performs bulk netlink dumps to populate its EVPN tables. These send RTM_GETNEIGH dumps directly to the kernel netlink socket. A dplane provider has no way to supply its own FDB/neighbor state. With a non-kernel dataplane (e.g. DPDK), the kernel has no relevant entries, so zebra's EVPN tables remain empty.

frr/zebra/zebra_vxlan.c

Lines 6020 to 6024 in bf2a8cf

/* Read the MAC FDB */
ns_walk_func(macfdb_read_ns, NULL, NULL);
/* Read neighbors */
ns_walk_func(neigh_read_ns, NULL, NULL);

The same pattern appears in other EVPN code paths:

macfdb_read_for_bridge(zns, ifp, zif->brslave_info.br_if,

macfdb_read_mcast_entry_for_vni(zns, ifp, zevpn->vni);

macfdb_read_specific_mac(zns, zif->brslave_info.br_if,

neigh_read_specific_ip(ipaddr, vlan_if);

2. Race between dplane provider sync and EVPN enablement

zebra_neigh_macfdb_update() discards all upward FDB notifications from dplane providers before EVPN is enabled:

frr/zebra/zebra_neigh.c

Lines 605 to 607 in bf2a8cf

/* We only process macfdb notifications if EVPN is enabled */
if (!is_evpn_enabled())
return;

A dplane provider that connects at zebra startup and syncs its FDB state (via DPLANE_OP_NEIGH_INSTALL contexts enqueued toward zebra) will have all its notifications silently dropped if BGP hasn't yet sent ZEBRA_ADVERTISE_ALL_VNI. There is no mechanism to re-trigger the sync after EVPN is enabled.

Impact

Any non-kernel dplane provider attempting to support EVPN faces: no initial FDB/neighbor population (kernel reads return empty results), lost FDB notifications due to the is_evpn_enabled() race, and broken duplicate address detection (kernel queries return nothing).

Possible approaches

  • Add a dplane callback or new dplane operation that allows providers to supply FDB/neighbor state when requested (replacing the direct macfdb_read() / neigh_read() calls).
  • When advertise_all_vni transitions from 0 to 1, trigger dplane providers to re-sync their FDB state (e.g. a new DPLANE_OP_EVPN_SYNC_REQUEST operation or a provider hook).
  • Buffer dplane FDB notifications received before EVPN is enabled and replay them once advertise_all_vni is set.

Let me know if there are things that I missed or if there would be alternative approaches.

Cheers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions