You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The check_fd_array_cnt__referenced_btfs subtest has a polling loop that waits for a BTF object to be freed after closing the owning BPF program. The loop comment states "max ~1 second" but only iterates 100 times with 1ms sleeps (= 100ms actual), causing flaky failures under CI load when async BPF program freeing (RCU + workqueue) takes longer than 100ms.
In tools/testing/selftests/bpf/prog_tests/fd_array.c:330:
/* The program is freed by a workqueue, so no reliable * way to sync, so just wait a bit (max ~1 second). */for (tries=100; tries >= 0; tries--) {
usleep(1000);
...
}
The BPF program free path after close(prog_fd) is:
This involves at least one RCU grace period plus workqueue scheduling. Under load, RCU grace periods alone can exceed 100ms, making the 100ms total timeout insufficient.
The comment says "max ~1 second" which matches the pattern in prog_tests/exe_ctx.c:45:
usleep(1000); /* Wait 1ms per iteration, up to 1 sec total */
...where the loop runs 1000 iterations (not 100).
Proposed Fix
Change tries = 100 to tries = 1000 so the actual maximum wait matches the documented intent of ~1 second. This is a 10x increase in maximum wait time, which costs nothing in the success path (the loop breaks immediately once BTF is freed) but prevents false failures under load.
Without this fix, the test will continue to flake under CI load. While currently rare, the failure rate will increase as CI workloads grow or KASAN/debug options slow the kernel. The test provides important coverage for fd_array BTF reference counting behavior, so it should be reliable rather than denylisted.
References
tools/testing/selftests/bpf/prog_tests/fd_array.c:330 (the bug)
tools/testing/selftests/bpf/prog_tests/exe_ctx.c:45 (correct pattern for comparison)
Commit 1c593d7402b1 ("selftests/bpf: Add tests for fd_array_cnt") — original introduction
kernel/bpf/syscall.c:2404 (__bpf_prog_put — the async free path)
Summary
The
check_fd_array_cnt__referenced_btfssubtest has a polling loop that waits for a BTF object to be freed after closing the owning BPF program. The loop comment states "max ~1 second" but only iterates 100 times with 1ms sleeps (= 100ms actual), causing flaky failures under CI load when async BPF program freeing (RCU + workqueue) takes longer than 100ms.Failure Details
fd_array_cnt/referenced_btfs(test Make job matrix reusable #115/5 in test_progs)Root Cause Analysis
In
tools/testing/selftests/bpf/prog_tests/fd_array.c:330:The BPF program free path after
close(prog_fd)is:bpf_prog_put→__bpf_prog_put→ (potentiallyschedule_work)__bpf_prog_put_noref→call_rcu_tasks_traceorcall_rcu__bpf_prog_put_rcu→schedule_workbpf_prog_free_deferred→bpf_free_used_btfsThis involves at least one RCU grace period plus workqueue scheduling. Under load, RCU grace periods alone can exceed 100ms, making the 100ms total timeout insufficient.
The comment says "max ~1 second" which matches the pattern in
prog_tests/exe_ctx.c:45:...where the loop runs 1000 iterations (not 100).
Proposed Fix
Change
tries = 100totries = 1000so the actual maximum wait matches the documented intent of ~1 second. This is a 10x increase in maximum wait time, which costs nothing in the success path (the loop breaks immediately once BTF is freed) but prevents false failures under load.Patch file:
0001-selftests-bpf-Fix-insufficient-wait-timeout-in-fd_a.patchImpact
Without this fix, the test will continue to flake under CI load. While currently rare, the failure rate will increase as CI workloads grow or KASAN/debug options slow the kernel. The test provides important coverage for fd_array BTF reference counting behavior, so it should be reliable rather than denylisted.
References
tools/testing/selftests/bpf/prog_tests/fd_array.c:330(the bug)tools/testing/selftests/bpf/prog_tests/exe_ctx.c:45(correct pattern for comparison)1c593d7402b1("selftests/bpf: Add tests for fd_array_cnt") — original introductionkernel/bpf/syscall.c:2404(__bpf_prog_put— the async free path)