Description
After upgrading FRR to 10.5.2 (previously worked on 10.1.4), interface-based eBGP unnumbered with RFC5549 next-hops (IPv4 route with IPv6 link-local nexthop) becomes unstable during netlink churn. When cilium-agent is restarted on a Kubernetes node, zebra withdraws the kernel default route (proto bgp). In BGP table, 0.0.0.0/0 remains present, but both IPv6 LL next-hops become (inaccessible) and thus the route becomes invalid / no best-path. Connectivity to the LL next-hop is still fine from the kernel (ping + neigh reachable). A link flap (shutdown/no shutdown) on ONE uplink interface immediately makes the default route return without restarting FRR.
This looks like a zebra nexthop-tracking / netlink sync regression triggered by unrelated interface deletion (Cilium health veth) causing LL nexthop resolution to get stuck.
- BGP: eBGP unnumbered over 2 uplinks:
- enp161s0f0np0
- enp161s0f1np1
- Default route learned from ToR via RFC5549:
0.0.0.0/0 with next-hops:
fe80::b2cf:eff:fe0c:fb70 dev enp161s0f0np0
fe80::b2cf:eff:fe0c:f970 dev enp161s0f1np1
- FRR runs on the host (not in a pod)
Version
- FRR:10.5.2 (regression vs 10.1.4)
- OS: Ubuntu 24.04.1 - 6.8.0-62-generic
- Platform: Kubernetes node (bare metal/VM) running Cilium
- Cilium: 1.18.2 (cilium-agent restart triggers netlink churn)
How to reproduce
- Ensure FRR is running on node with eBGP unnumbered over interfaces and receiving
0.0.0.0/0 via RFC5549 IPv6 link-local next-hops.
- Verify default route installed in kernel:
ip route show default shows proto bgp with two nexthops via fe80::... on the two uplinks.
- On the same node, restart Cilium agent pod:
kubectl -n kube-system delete pod -l k8s-app=cilium -o name --field-selector spec.nodeName=<NODE>
- Observe: default route is removed from kernel and FRR marks BGP next-hops inaccessible.
Expected behavior
- Netlink churn due to unrelated veth/interface changes (e.g. Cilium health interface) should not break zebra nexthop tracking for existing IPv6 LL next-hops on uplink interfaces.
0.0.0.0/0 should remain valid and installed in kernel if LL neighbors remain reachable.
Actual behavior
- During Cilium restart, zebra withdraws the default route from kernel:
Deleted default ... proto bgp ... nexthop via inet6 fe80::... dev enp...
- In FRR:
show bgp ipv4 unicast 0.0.0.0/0 shows next-hops as (inaccessible) and route becomes invalid, no best-path.
- In kernel at the same time, the neighbors remain reachable:
ip -6 neigh show dev enp... shows REACHABLE
ping -I enp161s0f0np0 fe80::b2cf:eff:fe0c:fb70 continues to respond
Additional context
Kernel netlink monitor (captured during cilium-agent restart)
(typos possible due to OCR)
ip -ts monitor link route neigh
[2026-03-03114:48:56.2955421 158: 1xc_health@if157: ‹BROADCAST, MULTICAST› mtu 8900 adisc noqueue state DOWN group default
link/ether d2:51:0c:b9:77:7c brd ff:ff:ff:ff:ff:ff link-netnsid 5
[2026-03-03T14:48:56.295755]
Deleted 10.0.45.57 dev Ixc_health Iladdr 36:8e:fd:e0:8e:91 STALE
[2026-03-03T14:48:56.295797]
Deleted fe80::/64 dev Ixc_health proto kernel metric 256 pret medium
[2026-03-03T14:48:56.295824]
Deleted local fe80::d051:cff:feb9:777c dev Ixc_health table local proto kernel metric 0 pref medium
[2026-03-03T14:48:56.295925]
Deleted multicast ff00::/8 dev Ixc_health table local proto kernel metric 256 pref medium
[2026-03-03T14:48:56.295946]
Deleted ff02::16 dev Ixc_health Iladdr 33:33:00:00:00:16 NOARP
[2026-03-03T14:48:56.295969] Deleted ff02::1:ffb9:777c dev Ixc_health Iladdr 33:33:ff:b9:77:7c NOARP
[2026-03-03T14:48:56.295989] Deleted ff02::2 dev Ixc_health Iladdr 33:33:00:00:00:02 NOARP
[2026-03-03T14:48:56.362898] Deleted 158: Ixc_health@NONE: ‹BROADCAST, MULTICAST> mtu 8900 adisc noop state DOWN group default
‹BROADCAST, MULTICAST›
link/ether d2:51:0c:b9:77:7c brd ff:ff:ff:ff:ff:ff
[2026-03-03T14:48:56.363672] Deleted default nhid 43 proto bgp sro 10.138.208.63 metric 20
nexthop via ineto fe80::b2cf:eff:fe0c:f970 dev enp161s0f1np1 weight 1
nexthop via ineto fe80::b2cf:eff:fe0c:fb70 dev enp161søfønp@ weight 1
- Cilium deletes its
lxc_health interface and routes. Immediately after, the BGP default route is deleted.
Checklist
Description
After upgrading FRR to 10.5.2 (previously worked on 10.1.4), interface-based eBGP unnumbered with RFC5549 next-hops (IPv4 route with IPv6 link-local nexthop) becomes unstable during netlink churn. When
cilium-agentis restarted on a Kubernetes node, zebra withdraws the kernel default route (proto bgp). In BGP table,0.0.0.0/0remains present, but both IPv6 LL next-hops become(inaccessible)and thus the route becomesinvalid/ no best-path. Connectivity to the LL next-hop is still fine from the kernel (ping + neigh reachable). A link flap (shutdown/no shutdown) on ONE uplink interface immediately makes the default route return without restarting FRR.This looks like a zebra nexthop-tracking / netlink sync regression triggered by unrelated interface deletion (Cilium health veth) causing LL nexthop resolution to get stuck.
0.0.0.0/0with next-hops:fe80::b2cf:eff:fe0c:fb70 dev enp161s0f0np0fe80::b2cf:eff:fe0c:f970 dev enp161s0f1np1Version
How to reproduce
0.0.0.0/0via RFC5549 IPv6 link-local next-hops.ip route show defaultshowsproto bgpwith two nexthops viafe80::...on the two uplinks.kubectl -n kube-system delete pod -l k8s-app=cilium -o name --field-selector spec.nodeName=<NODE>Expected behavior
0.0.0.0/0should remain valid and installed in kernel if LL neighbors remain reachable.Actual behavior
Deleted default ... proto bgp ... nexthop via inet6 fe80::... dev enp...show bgp ipv4 unicast 0.0.0.0/0shows next-hops as(inaccessible)and route becomesinvalid, no best-path.ip -6 neigh show dev enp...shows REACHABLEping -I enp161s0f0np0 fe80::b2cf:eff:fe0c:fb70continues to respondAdditional context
Kernel netlink monitor (captured during cilium-agent restart)
(typos possible due to OCR)
ip -ts monitor link route neigh
[2026-03-03114:48:56.2955421 158: 1xc_health@if157: ‹BROADCAST, MULTICAST› mtu 8900 adisc noqueue state DOWN group default
link/ether d2:51:0c:b9:77:7c brd ff:ff:ff:ff:ff:ff link-netnsid 5
[2026-03-03T14:48:56.295755]
Deleted 10.0.45.57 dev Ixc_health Iladdr 36:8e:fd:e0:8e:91 STALE
[2026-03-03T14:48:56.295797]
Deleted fe80::/64 dev Ixc_health proto kernel metric 256 pret medium
[2026-03-03T14:48:56.295824]
Deleted local fe80::d051:cff:feb9:777c dev Ixc_health table local proto kernel metric 0 pref medium
[2026-03-03T14:48:56.295925]
Deleted multicast ff00::/8 dev Ixc_health table local proto kernel metric 256 pref medium
[2026-03-03T14:48:56.295946]
Deleted ff02::16 dev Ixc_health Iladdr 33:33:00:00:00:16 NOARP
[2026-03-03T14:48:56.295969] Deleted ff02::1:ffb9:777c dev Ixc_health Iladdr 33:33:ff:b9:77:7c NOARP
[2026-03-03T14:48:56.295989] Deleted ff02::2 dev Ixc_health Iladdr 33:33:00:00:00:02 NOARP
[2026-03-03T14:48:56.362898] Deleted 158: Ixc_health@NONE: ‹BROADCAST, MULTICAST> mtu 8900 adisc noop state DOWN group default
‹BROADCAST, MULTICAST›
link/ether d2:51:0c:b9:77:7c brd ff:ff:ff:ff:ff:ff
[2026-03-03T14:48:56.363672] Deleted default nhid 43 proto bgp sro 10.138.208.63 metric 20
nexthop via ineto fe80::b2cf:eff:fe0c:f970 dev enp161s0f1np1 weight 1
nexthop via ineto fe80::b2cf:eff:fe0c:fb70 dev enp161søfønp@ weight 1
lxc_healthinterface and routes. Immediately after, the BGP default route is deleted.Checklist