Skip to content

Prevent bytecode serialization#426

Open
qdeslandes wants to merge 4 commits intofacebook:mainfrom
qdeslandes:prevent_bytecode_serialization
Open

Prevent bytecode serialization#426
qdeslandes wants to merge 4 commits intofacebook:mainfrom
qdeslandes:prevent_bytecode_serialization

Conversation

@qdeslandes
Copy link
Contributor

Summary

  • Extract BPF object references (prog_fd, maps, link) from bf_program into a new bf_handle struct owned by bf_cgen, making bf_program an ephemeral bytecode generator that is discarded after load
  • Move bf_link and bf_map creation to the point of use (create-at-alloc), removing the two-phase init/create lifecycle and mutable setters (bf_map_set_key_size, bf_map_set_n_elems, etc.)
  • Remove bf_program serialization entirely -- only bf_handle (BPF object references) is persisted; bytecode is regenerated on restore

Motivation

bf_program was responsible for both generating BPF bytecode and managing BPF kernel objects (program fd, maps, link). This coupling forced the program to be kept alive after loading just to hold file descriptors, and required serializing bytecode that was never reused on restore (it was regenerated anyway).

By splitting the BPF object references into bf_handle, the codegen becomes stateless: create a bf_program, generate, load into the handle, discard the program. The handle survives at bf_cgen level with only the kernel-side references needed for runtime operation (counters, pinning, link updates).

Changes

The series is structured as four incremental refactoring steps:

  1. bf_link: create-at-alloc -- bf_link_new() now both allocates and attaches the BPF link. bf_link_free() is the detach mechanism. Removes bf_link_attach(), bf_link_detach(), and the hook parameter from bf_link_update() (link now stores its own hook).

  2. bf_map: create-at-alloc -- bf_map_new() creates the actual BPF map immediately. Removes bf_map_create(), bf_map_destroy(), and bf_map_set_*() mutators. Maps are created on demand during bf_program_load() instead of pre-allocated in bf_program_new(). The log map is only created when the chain uses BF_CHAIN_LOG.

  3. Introduce bf_handle -- New struct holding prog_fd, link, cmap, pmap, lmap, and sets. Initially owned by bf_program, with delegation methods for attach/detach/pin/unpin.

  4. Move bf_handle to bf_cgen -- bf_cgen creates and owns the handle. bf_program_new() receives it as a non-owning pointer. Removes bf_program_pack/bf_program_new_from_pack, bf_program_attach/bf_program_detach/bf_program_unload/bf_program_pin/bf_program_unpin, and bf_program_take_handle.

bf_program creates the bf_link object in bf_program_new(), for every new
program. If the program is not attached, then program->link->hookopts is
NULL, and that's how we know.

This approach is not ideal as we might create a bf_link object we don't
need. Additionally, it leaves bf_link in a temporary state as long as we
don't attach the program to the kernel: the bf_link is in the bf_program
object and serialized, but not materialized.

This commit only creates bf_link when the BPF link object is needed.
Calling bf_link_new() will create the BPF link, and fail on error. Doing
so, we know that any bf_link object refers to a program attached to the
kernel.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the bpfilter codebase to separate BPF object lifecycle management from bytecode generation by introducing a new bf_handle abstraction. The changes make bf_program ephemeral (discarded after load), moving persistent BPF object references (prog_fd, maps, link) to bf_handle which is owned by bf_cgen. This eliminates the need to serialize bytecode—only the handle (BPF object references) is persisted, as bytecode is regenerated on restore.

Changes:

  • Introduced bf_handle struct to manage BPF object references separately from bytecode generation context
  • Refactored bf_link and bf_map to use create-at-alloc lifecycle, removing two-phase init/create pattern
  • Removed bf_program serialization and related functions (bf_program_pack, bf_program_new_from_pack, etc.)

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/bpfilter/cgen/handle.h New header defining the bf_handle abstraction for managing BPF object references
src/bpfilter/cgen/handle.c Implementation of bf_handle with pin/unpin, attach/detach, and serialization support
src/bpfilter/cgen/program.h Updated to remove serialization API and accept bf_handle in constructor
src/bpfilter/cgen/program.c Refactored to populate handle during load, removed serialization and lifecycle management
src/bpfilter/cgen/cgen.h Changed to own bf_handle instead of bf_program
src/bpfilter/cgen/cgen.c Updated to create handle and pass to program during generation
src/bpfilter/cgen/prog/map.h Updated documentation, removed mutator methods (set_key_size, set_value_size, etc.)
src/bpfilter/cgen/prog/map.c Implemented create-at-alloc: bf_map_new now creates actual BPF map immediately
src/bpfilter/cgen/prog/link.h Updated bf_link_new to attach immediately, removed bf_link_attach/detach/update hook param
src/bpfilter/cgen/prog/link.c Consolidated link creation and attachment into bf_link_new, updated bf_link_update signature
src/bpfilter/xlate/cli.c Updated to access link/maps through handle instead of program
src/bpfilter/CMakeLists.txt Added handle.h and handle.c to build
tests/e2e/matchers/set.sh Fixed set map name prefix from 'set_' to 'bf_set_'
tests/e2e/matchers/set_empty_index.sh New regression test for empty set index handling
tests/e2e/CMakeLists.txt Added new set_empty_index test

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +7 to +20
# program->sets. If _bf_program_load_sets_maps skips empty sets without
# preserving their index slot, subsequent non-empty sets end up at wrong
# positions in program->sets, causing fixup resolution to fail or resolve the
# wrong map.

. "$(dirname "$0")"/../e2e_test_util.sh

make_sandbox
start_bpfilter

# empty_first: chain->sets[0], empty (skipped during map creation)
# active_second: chain->sets[1], non-empty
# The rule's matcher stores set_index=1 for active_second.
# If program->sets doesn't preserve index correspondence, the fixup for
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment references outdated code structure. Line 7 mentions "program->sets" but after this refactoring, the sets are now stored in "handle->sets" (accessed via program->handle->sets from within a program). Line 20 also references "program->sets". These comments should be updated to reflect the new architecture where sets are managed by the handle, not the program directly.

Copilot uses AI. Check for mistakes.
value = program->handle->pmap->fd;
break;
case BF_FIXUP_TYPE_LOG_MAP_FD:
insn_type = BF_FIXUP_INSN_IMM;
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential NULL pointer dereference: if lmap is NULL (which can happen when BF_CHAIN_LOG flag is not set), accessing program->handle->lmap->fd will cause a crash. While the current code guards both the fixup emission (line 629 in bf_program_generate) and map creation (line 738 in _bf_program_load_log_map) with the same flag check, this creates a fragile dependency. Consider adding a NULL check here: if (!program->handle->lmap) return bf_err_r(-EINVAL, "log map fixup requested but map not created"); for defensive programming.

Suggested change
insn_type = BF_FIXUP_INSN_IMM;
insn_type = BF_FIXUP_INSN_IMM;
if (!program->handle->lmap)
return bf_err_r(-EINVAL,
"log map fixup requested but map not created");

Copilot uses AI. Check for mistakes.
@qdeslandes qdeslandes force-pushed the prevent_bytecode_serialization branch 2 times, most recently from 43f08bf to 73f0e0a Compare February 18, 2026 13:51
Move BPF map creation into bf_map_new/bf_map_new_from_set so that the
kernel map object is created at allocation time, removing the separate
bf_map_create/bf_map_destroy lifecycle and the bf_map_set_* mutators.

bf_program no longer pre-allocates maps in bf_program_new; instead,
each _bf_program_load_*_map helper creates its map on demand during
bf_program_load. This makes cmap, pmap, and lmap nullable throughout
the program's lifetime, with NULL checks added in pack, dump, pin,
unpin, and unload paths.

Empty sets now push a NULL entry into program->sets to preserve index
correspondence with chain->sets, which the fixup resolution relies on.

An e2e test validates that chains with empty sets preceding non-empty
sets load and filter correctly.
Extract BPF object references (prog_fd, maps, link) from bf_program
into a new bf_handle struct, decoupling bytecode generation from
kernel-side object lifetime management.
bf_handle is now created by default in bf_cgen_new and owned by bf_cgen
throughout its lifetime. bf_program_new receives the handle as a
non-owning pointer parameter, removing the need for bf_program to
create or transfer handle ownership.

This removes bf_program serialization, bf_program_take_handle, bf_program_attach,
bf_program_detach, bf_program_unload, bf_program_pin, and
bf_program_unpin, which were thin wrappers forwarding to bf_handle.
Callers in cgen.c and cli.c now operate on cgen->handle directly.
@qdeslandes qdeslandes force-pushed the prevent_bytecode_serialization branch from 73f0e0a to c6686ec Compare February 18, 2026 13:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments