B4/timer nolock#10629
Closed
mykyta5 wants to merge 9 commits intokernel-patches:bpf-next_basefrom
Closed
Conversation
f90e73b to
9f012b6
Compare
e7b5368 to
1e194b1
Compare
894d24f to
f591629
Compare
0b35e5a to
f0ad505
Compare
47c3bc1 to
f37bab3
Compare
5fad8c5 to
f61cb76
Compare
8118549 to
d3577f3
Compare
be3c2bf to
8709eff
Compare
ea745e1 to
d9ab17a
Compare
91d46f6 to
9624cf2
Compare
This series reworks implementation of BPF timer and workqueue APIs. The goal is to make both timers and wq non-blocking, enabling their use in NMI context. Today this code relies on a bpf_spin_lock embedded in the map element to serialize: * init of the async object, * setting/changing the callback and bpf_prog * starting/cancelling the timer/work * tearing down when the map element is deleted or the map’s user ref is dropped The basic design approach in this series: * Use irq_work to offload all blocking work from NMI * Introduce refcount to guarantee lifetime of the bpf_async_cb structs deferred to potentially multiple irq_work callbacks * Keep objects under RCU protection to make sure they are not freed while kfuncs/helpers access them (We can't use refcnt for this, as refcnt itself is part of the bpf_async_cb struct) Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> --- Changes in v8: - Return -EBUSY in bpf_async_read_op() if last_seq is failed to be set - In bpf_async_cancel_and_free() drop bpf_async_cb ref after calling bpf_async_process() - Link to v7: https://lore.kernel.org/r/20260122-timer_nolock-v7-0-04a45c55c2e2@meta.com Changes in v7: - Addressed Andrii's review points from the previous version - nothing very significang. - Added NMI stress tests for bpf_timer - hit few verifier failing checks and removed them. - Address sparse warning in the bpf_async_update_prog_callback() - Link to v6: https://lore.kernel.org/r/20260120-timer_nolock-v6-0-670ffdd787b4@meta.com Changes in v6: - Reworked destruction and refcnt use: - On cancel_and_free() set last_seq to BPF_ASYNC_DESTROY value, drop map's reference - In irq work callback, atomically switch DESTROY to DESTROYED, cancel timer/wq - Free bpf_async_cb on refcnt going to 0. - Link to v5: https://lore.kernel.org/r/20260115-timer_nolock-v5-0-15e3aef2703d@meta.com Changes in v5: - Extracted lock-free algorithm for updating cb->prog and cb->callback_fn into a function bpf_async_update_prog_callback(), added a new commit and introduces this function and uses it in __bpf_async_set_callback(), bpf_timer_cancel() and bpf_async_cancel_and_free(). This allows to move the change into the separate commit without breaking correctness. - Handle NULL prog in bpf_async_update_prog_callback(). - Link to v4: https://lore.kernel.org/r/20260114-timer_nolock-v4-0-fa6355f51fa7@meta.com Changes in v4: - Handle irq_work_queue failures in both schedule and cancel_and_free paths: introduced bpf_async_refcnt_dec_cleanup() that decrements refcnt and makes sure if last reference is put, there is at least one irq_work scheduled to execute final cleanup. - Additional refcnt inc/dec in set_callback() + rcu lock to make sure cleanup is not running at the same time as set_callback(). - Added READ_ONCE where it was needed. - Squash 'bpf: Refactor __bpf_async_set_callback()' commit into 'bpf: Add lock-free cell for NMI-safe async operations' - Removed mpmc_cell, use seqcount_latch_t instead. - Link to v3: https://lore.kernel.org/r/20260107-timer_nolock-v3-0-740d3ec3e5f9@meta.com Changes in v3: - Major rework - Introduce mpmc_cell, allowing concurrent writes and reads - Implement irq_work deferring - Adding selftests - Introduces bpf_timer_cancel_async kfunc - Link to v2: https://lore.kernel.org/r/20251105-timer_nolock-v2-0-32698db08bfa@meta.com Changes in v2: - Move refcnt initialization and put (from cancel_and_free()) from patch 5 into the patch 4, so that patch 4 has more clear and full implementation and use of refcnt - Link to v1: https://lore.kernel.org/r/20251031-timer_nolock-v1-0-b064ae403bfb@meta.com --- b4-submit-tracking --- { "series": { "revision": 8, "change-id": "20251028-timer_nolock-457f5b9daace", "prefixes": [ "bpf-next" ], "history": { "v1": [ "20251031-timer_nolock-v1-0-b064ae403bfb@meta.com" ], "v2": [ "20251105-timer_nolock-v2-0-32698db08bfa@meta.com" ], "v3": [ "20260107-timer_nolock-v3-0-740d3ec3e5f9@meta.com" ], "v4": [ "20260114-timer_nolock-v4-0-fa6355f51fa7@meta.com" ], "v5": [ "20260115-timer_nolock-v5-0-15e3aef2703d@meta.com" ], "v6": [ "20260120-timer_nolock-v6-0-670ffdd787b4@meta.com" ], "v7": [ "20260122-timer_nolock-v7-0-04a45c55c2e2@meta.com" ] } } }
Refactor bpf timer and workqueue helpers to allow calling them from NMI
context by making all operations lock-free and deferring NMI-unsafe
work to irq_work.
Previously, bpf_timer_start(), and bpf_wq_start()
could not be called from NMI context because they acquired
bpf_spin_lock and called hrtimer/schedule_work APIs directly. This
patch removes these limitations.
Key changes:
* Remove bpf_spin_lock from struct bpf_async_kern.
* Initialize/Destroy via setting/unsetting bpf_async_cb pointer
atomically.
* Add per-bpf_async_cb irq_work to defer NMI-unsafe
operations (hrtimer_start, hrtimer_try_to_cancel, schedule_work) from
NMI to softirq context.
* Use the lock-free seqcount_latch_t to pass operation
commands (start/cancel/free) and parameters
from NMI-safe callers to the irq_work handler.
* Add reference counting to bpf_async_cb to ensure the object stays
alive until all scheduled irq_work completes.
* Move bpf_prog_put() to RCU callback to handle races between
set_callback() and cancel_and_free().
* Modify cancel_and_free() path:
* Detach bpf_async_cb.
* Signal destruction to irq_work side via setting last_seq to
BPF_ASYNC_DESTROY.
* On receiving BPF_ASYNC_DESTROY, cancel timer/wq.
* Free bpf_async_cb on refcnt reaching 0, wait for both rcu and rcu
task trace grace periods before freeing the bpf_async_cb. Removed
unnecessary rcu locks, as kfunc/helper allways assumes rcu or rcu
task trace lock.
This enables BPF programs attached to NMI-context hooks (perf
events) to use timers and workqueues for deferred processing.
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Reviewed-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Extend the verifier to recognize struct bpf_timer as a valid kfunc argument type. Previously, bpf_timer was only supported in BPF helpers. This prepares for adding timer-related kfuncs in subsequent patches. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> Acked-by: Andrii Nakryiko <andrii@kernel.org>
introducing bpf timer cancel kfunc that attempts canceling timer asynchronously, hence, supports working in NMI context. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Refactor timer selftests, extracting stress test into a separate test. This makes it easier to debug test failures and allows to extend. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Extend BPF timer selftest to run stress test for async cancel. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Add test that verifies that bpf_timer_cancel_async works: can cancel callback successfully. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Add stress tests for BPF timers that run in NMI context using perf_event programs attached to PERF_COUNT_HW_CPU_CYCLES. The tests cover three scenarios: - nmi_race: Tests concurrent timer start and async cancel operations - nmi_update: Tests updating a map element (effectively deleting and inserting new for array map) from within a timer callback - nmi_cancel: Tests timer self-cancellation attempt. A common test_common() helper is used to share timer setup logic across all test modes. The tests spawn multiple threads in a child process to generate perf events, which trigger the BPF programs in NMI context. Hit counters verify that the NMI code paths were actually exercised. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Now bpf_timer can be used in tracepoints, so these tests are no longer relevant. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
d9ab17a to
ed19117
Compare
3a73c9c to
aa9aae9
Compare
ed19117 to
4baccb8
Compare
cd8cbf1 to
358bea9
Compare
|
Automatically cleaning up stale PR; feel free to reopen if needed |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.