Skip to content

net/sched: cls_bpf: prevent unbounded recursion in offload rollback#12157

Open
kernel-patches-daemon-bpf[bot] wants to merge 1 commit into
bpf_basefrom
series/1099113=>bpf
Open

net/sched: cls_bpf: prevent unbounded recursion in offload rollback#12157
kernel-patches-daemon-bpf[bot] wants to merge 1 commit into
bpf_basefrom
series/1099113=>bpf

Conversation

@kernel-patches-daemon-bpf
Copy link
Copy Markdown

Pull request for series with
subject: net/sched: cls_bpf: prevent unbounded recursion in offload rollback
version: 1
url: https://patchwork.kernel.org/project/netdevbpf/list/?series=1099113

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: 49b1831
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1099113
version: 1

@kernel-patches-daemon-bpf
Copy link
Copy Markdown
Author

Upstream branch: 7dd6256
series: https://patchwork.kernel.org/project/netdevbpf/list/?series=1099113
version: 1

Quan Sun reported [1] a stack overflow in cls_bpf_offload_cmd().

Reproducer on netdevsim: add a skip_sw cls_bpf filter, set the
bpf_tc_accept debugfs knob to 0, then `tc filter replace`. The replace
calls tc_setup_cb_replace() which fails. cls_bpf_offload_cmd() then
swaps prog/oldprog and recursively calls itself to roll back. But
bpf_tc_accept=0 makes the rollback fail too, which triggers yet another
rollback frame with the same arguments, and so on until the stack is
exhausted.

bpf_tc_accept is just a convenient knob for the reproducer. Any driver
whose tc_setup_cb_replace() fails twice in a row can hit the same loop,
so this is not a netdevsim-only issue.

Two ways to fix it:

  1) Have the rollback call tc_setup_cb_add() on oldprog instead of
     re-entering cls_bpf_offload_cmd().
  2) Mark the rollback frame with a flag and skip a second-level
     rollback from inside it.

Go with (2). It is the smaller change and keeps the original behaviour:
the rollback still goes through tc_setup_cb_replace(), so the driver
gets one real chance to restore its state. If that attempt also fails,
we just return the original error instead of recursing.

[1]: https://lore.kernel.org/bpf/ce5a6005-3c5e-4696-9e05-eba9461dc860@std.uestc.edu.cn/T/#u

Fixes: 102740b ("cls_bpf: fix offload assumptions after callback conversion")
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant