You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The tc_tunnel/udp_mpls subtest fails intermittently on aarch64 with EINPROGRESS (errno 115) because the 1000ms connect() timeout is insufficient for the FOU+MPLS+ipip tunnel path to converge under QEMU emulation. Increasing the timeout to 2000ms and using the TIMEOUT_MS macro consistently fixes the issue.
The tc_tunnel test suite validates BPF-based tunnel encapsulation/decapsulation. The udp_mpls subtest has the most complex kernel decapsulation path, requiring:
FOU (Foo-over-UDP) rx port registration (ip fou add port 6635 ipproto 137)
After configure_kernel_decapsulation() completes all setup commands and brings testtun0 up, the test immediately tries to establish a TCP connection through the tunnel path. On aarch64 QEMU, the kernel needs more time after interface up for the full data path (BPF encap → FOU/MPLS encap → veth → FOU decap → MPLS routing → server) to become operational.
The client connect() call in connect_client_to_server() (test_tc_tunnel.c:171) has SO_SNDTIMEO set to 1000ms. When the tunnel path isn't ready within this window, connect() returns EINPROGRESS and the test fails.
Contributing factors:
Commit 2790db208b44 increased the timeout from 500ms to 1000ms but only tested on x86_64, not aarch64 QEMU
connect_client_to_server() used a hardcoded 1000 instead of the TIMEOUT_MS macro, meaning the timeout wasn't consolidated with the server-side timeout
udp_mpls is the only subtest combining all three of: FOU, MPLS, and kernel decapsulation (other MPLS variants either skip kernel decap via expect_kern_decap_failure or don't use FOU)
Proposed Fix
The patch (0001-selftests-bpf-Fix-tc_tunnel-udp_mpls-timeout-on-aarc.patch) makes two changes:
Increase TIMEOUT_MS from 1000 to 2000 — provides sufficient margin for tunnel path convergence on emulated architectures while still detecting genuine connectivity failures promptly
Use TIMEOUT_MS in connect_client_to_server() instead of the hardcoded 1000 — ensures server and client timeouts are consistent and controlled by a single macro
Impact
When this test fails, it is the sole failure in the CI run (1 FAILED out of 6202 subtests), causing the entire aarch64 test_progs job to report failure. This creates noise for unrelated patch submissions and wastes reviewer time investigating false negatives.
References
Prior timeout fix: 2790db208b44 ("selftests/bpf: Improve tc_tunnel test reliability") — Jiayuan Chen, Mar 2026
Summary
The
tc_tunnel/udp_mplssubtest fails intermittently on aarch64 withEINPROGRESS(errno 115) because the 1000msconnect()timeout is insufficient for the FOU+MPLS+ipip tunnel path to converge under QEMU emulation. Increasing the timeout to 2000ms and using theTIMEOUT_MSmacro consistently fixes the issue.Failure Details
tc_tunnel/udp_mpls(subtest temporarily disable tc_bpf test to get CI green #15 oftc_tunnel)connect()returnsEINPROGRESS(errno 115) due to SO_SNDTIMEO expirytc_tunnel/udp_mpls)Root Cause Analysis
The
tc_tunneltest suite validates BPF-based tunnel encapsulation/decapsulation. Theudp_mplssubtest has the most complex kernel decapsulation path, requiring:ip fou add port 6635 ipproto 137)mode any ttl 255testtun0input enabled)After
configure_kernel_decapsulation()completes all setup commands and bringstesttun0up, the test immediately tries to establish a TCP connection through the tunnel path. On aarch64 QEMU, the kernel needs more time after interface up for the full data path (BPF encap → FOU/MPLS encap → veth → FOU decap → MPLS routing → server) to become operational.The client
connect()call inconnect_client_to_server()(test_tc_tunnel.c:171) has SO_SNDTIMEO set to 1000ms. When the tunnel path isn't ready within this window,connect()returnsEINPROGRESSand the test fails.Contributing factors:
2790db208b44increased the timeout from 500ms to 1000ms but only tested on x86_64, not aarch64 QEMUconnect_client_to_server()used a hardcoded1000instead of theTIMEOUT_MSmacro, meaning the timeout wasn't consolidated with the server-side timeoutudp_mplsis the only subtest combining all three of: FOU, MPLS, and kernel decapsulation (other MPLS variants either skip kernel decap viaexpect_kern_decap_failureor don't use FOU)Proposed Fix
The patch (
0001-selftests-bpf-Fix-tc_tunnel-udp_mpls-timeout-on-aarc.patch) makes two changes:TIMEOUT_MSfrom 1000 to 2000 — provides sufficient margin for tunnel path convergence on emulated architectures while still detecting genuine connectivity failures promptlyTIMEOUT_MSinconnect_client_to_server()instead of the hardcoded1000— ensures server and client timeouts are consistent and controlled by a single macroImpact
When this test fails, it is the sole failure in the CI run (1 FAILED out of 6202 subtests), causing the entire aarch64
test_progsjob to report failure. This creates noise for unrelated patch submissions and wastes reviewer time investigating false negatives.References
2790db208b44("selftests/bpf: Improve tc_tunnel test reliability") — Jiayuan Chen, Mar 2026