Skip to content

Regarding the issue of Zebra's inability to recognize interface recovery #21193

@opennw-rui

Description

@opennw-rui

Description

Hello, developer:

I often use Linux FRRouting to interface with BGP networks at network edge nodes in data centers. During my daily work and maintenance, I found that there is an issue with Zebra, specifically manifested as:

When the physical interface restarts for some reason (manual operation under bash, shutdown of the upstream router to restore the interconnect interface, or oscillation of the physical interface), Zebra will not recognize the interface recovery, and the IP routing table will still display "inactive" for the interface IP routing

For example

SoftRouting# do show ip route connected
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

IPv4 unicast VRF default:
C>* 10.10.17.0/24 is directly connected, vpls-wg0, weight 1, 1d06h22m
C>* 192.168.2.1/32 is directly connected, lo, weight 1, 2d08h34m
C>* 192.168.10.0/24 [0/425] is directly connected, lan, weight 1, 1d08h27m
C   192.168.71.0/24 [0/102] is directly connected, eth0 inactive, weight 1, 1d08h28m
SoftRouting# quit
root@SoftRouting:~# ip addr show eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp/id:65 qdisc mq state UP group default qlen 1000
    link/ether 60:be:b4:02:13:60 brd ff:ff:ff:ff:ff:ff
    altname enp2s0
    altname enx60beb4021360
    inet 192.168.71.100/24 brd 192.168.71.255 scope global noprefixroute eth0
       valid_lft forever preferred_lft forever
    inet6 240e:xxxx:2xxx:xxxx:xxxx:bxxx:fxxx2:1360/64 scope global dynamic noprefixroute
       valid_lft 2152792056sec preferred_lft 86136sec
    inet6 fe80::62be:b4ff:fe02:1360/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
root@SoftRouting:~#

This can lead to serious consequences. If the physical interface oscillates and the interface IP routing cannot be restored, it can cause serious impacts on BGP, OSPF, and other routes, including but not limited to learning from neighbors but rendering the interface routing inactive, making it impossible to install it properly in the IP routing table.

The current solution to this problem can only be to restart the FRR program, which may result in the complete disconnection and re establishment of BGP and OSPF sessions, leading to a prolonged network interruption.

Could you please take some time out of your busy schedule to fix this bug? Thank you.

Version

root@SoftRouting:~# vtysh

Hello, this is FRRouting (version 10.5.3).
Copyright 1996-2005 Kunihiro Ishiguro, et al.

SoftRouting# show version
FRRouting 10.5.3 (SoftRouting) on Linux(6.18.14-x64v2-xanmod1).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--sbindir=/usr/lib/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-rpki' '--disable-scripting' '--enable-pim6d' '--disable-grpc' '--disable-address-sanitizer' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-pcre2posix' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'
SoftRouting#

How to reproduce

To reproduce this bug, simply restart the interface under Linux bash, for example:

nmcli con up eth0

Expected behavior

Physical interface IP routing cannot recover from 'inactive' state

Actual behavior

SoftRouting# do show ip route connected
Codes: K - kernel route, C - connected, L - local, S - static,
R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
f - OpenFabric, t - Table-Direct,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure

IPv4 unicast VRF default:
C>* 10.10.17.0/24 is directly connected, vpls-wg0, weight 1, 1d06h22m
C>* 192.168.2.1/32 is directly connected, lo, weight 1, 2d08h34m
C>* 192.168.10.0/24 [0/425] is directly connected, lan, weight 1, 1d08h27m
C 192.168.71.0/24 [0/102] is directly connected, eth0 inactive, weight 1, 1d08h28m

Additional context

No response

Checklist

  • I have searched the open issues for this bug.
  • I have not included sensitive information in this report.

Metadata

Metadata

Assignees

No one assigned

    Labels

    triageNeeds further investigation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions