Skip to content

[bpf-ci-bot] selftests/bpf: test_task_work_array_map fails intermittently in parallel mode #471

@kernel-patches-review-bot

Description

@kernel-patches-review-bot

One-line summary

test_task_work/test_task_work_array_map fails in parallel test mode due to a race between repeated task work scheduling and process exit teardown ordering.

Test name(s) and mode(s)

  • test_task_work/test_task_work_array_map in test_progs_parallel and test_progs_no_alu32_parallel

Failure signature

expected 'hello world', found '{U' in arrmap map
expected 'hello world', found '' in arrmap map

CI run references

Root cause

The oncpu_array_map BPF program in tools/testing/selftests/bpf/progs/task_work.c lacks a guard against re-scheduling task work after the callback has already completed. Unlike the hash map handler (protected by BPF_NOEXIST) and the LRU map handler (which checks work->data[0]), the array map handler unconditionally re-schedules on every perf event.

The race window is:

  1. Task work callback runs successfully, writes "hello world" to work->data, state returns to BPF_TW_STANDBY.
  2. A subsequent perf event fires and re-schedules the same task work entry.
  3. The child process calls exit(0). In do_exit(), the kernel calls:
    • perf_event_exit_task() at line 956 — stops generating perf events
    • exit_mm() at line 964 — sets current->mm = NULL
    • exit_task_work() at line 976 — runs any remaining queued task work
  4. The re-queued task work callback executes after exit_mm() has set mm = NULL.
  5. bpf_copy_from_user_str()strncpy_from_user() fails because the address space is gone, leaving the map data corrupted or empty.

In parallel mode (-j), CPU contention increases the likelihood of this race by stretching timing windows between perf events and process exit.

Fix

Add work->data[0] check to oncpu_array_map(), matching the guard pattern used by oncpu_lru_map(). Once the callback has successfully written data, further re-scheduling is prevented.

-	if (!work)
+	if (!work || work->data[0])
 		return 0;

See attached patch: 0001-selftests-bpf-fix-task_work-array-map-race-in-parallel-mode.patch

Severity

Low — affects only parallel test mode; the test passes reliably in serial mode. The underlying bpf_task_work kernel infrastructure is correct; this is a test-only issue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions