Skip to content

bpf: attach reuseport steering filter exactly once per group#371

Merged
afrind merged 1 commit into
mainfrom
fix/reuseport-attach-once-per-group
Jun 1, 2026
Merged

bpf: attach reuseport steering filter exactly once per group#371
afrind merged 1 commit into
mainfrom
fix/reuseport-attach-once-per-group

Conversation

@afrind
Copy link
Copy Markdown
Contributor

@afrind afrind commented Jun 1, 2026

SO_ATTACH_REUSEPORT_CBPF is a property of the kernel reuseport group, not the individual fd: re-attaching replaces and frees the group's single prog while sibling sockets' RX softirqs may still be running it. The mvfst_hook_on_socket_create hook fired on every invocation — once per worker listener fd at bind() and again per accepted connection at runtime (site 2 wraps the worker's same listener fd) — turning that into a hot, multi-threaded use-after-free.

Dedup attachment per reuseport group, keyed by the bound address:port (via getsockname). State is encapsulated in a ReuseportSteering class held in a folly::Indestructible singleton (never destructs, so a late hook call from a draining IO thread can't touch freed state on shutdown):

  • A mutex-guarded F14 set of already-attached SocketAddresses. The lock is held across check -> setsockopt -> insert so concurrent workers binding the same group serialize and exactly one attaches.
  • A per-thread folly::ThreadLocal F14 set of resolved fds as a lock-free, syscall-free fast path for the steady-state per-connection case.

getsockname/setsockopt failures return without caching so a later hook retries; client sockets (where the option legitimately fails) re-run the rate-limited slow path harmlessly.


This change is Reviewable

SO_ATTACH_REUSEPORT_CBPF is a property of the kernel reuseport group,
not the individual fd: re-attaching replaces and frees the group's
single prog while sibling sockets' RX softirqs may still be running it.
The mvfst_hook_on_socket_create hook fired on every invocation — once
per worker listener fd at bind() and again per accepted connection at
runtime (site 2 wraps the worker's same listener fd) — turning that into
a hot, multi-threaded use-after-free.

Dedup attachment per reuseport group, keyed by the bound address:port
(via getsockname). State is encapsulated in a ReuseportSteering class
held in a folly::Indestructible singleton (never destructs, so a late
hook call from a draining IO thread can't touch freed state on shutdown):

- A mutex-guarded F14 set of already-attached SocketAddresses. The lock
  is held across check -> setsockopt -> insert so concurrent workers
  binding the same group serialize and exactly one attaches.
- A per-thread folly::ThreadLocal F14 set of resolved fds as a lock-free,
  syscall-free fast path for the steady-state per-connection case.

getsockname/setsockopt failures return without caching so a later hook
retries; client sockets (where the option legitimately fails) re-run the
rate-limited slow path harmlessly.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@gmarzot gmarzot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gmarzot reviewed 1 file and all commit messages.
Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on akash-a-n, michalhosna, mondain, Oxyd, peterchave, suhasHere, and TimEvens).

@afrind afrind merged commit 7b3da1b into main Jun 1, 2026
27 of 28 checks passed
@afrind afrind deleted the fix/reuseport-attach-once-per-group branch June 1, 2026 20:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants