Skip to content

feat(verify): port custom-section codec (C2)#20

Merged
hyperpolymath merged 1 commit into
mainfrom
feat/typed-wasm-verify-c2-section-parser
May 15, 2026
Merged

feat(verify): port custom-section codec (C2)#20
hyperpolymath merged 1 commit into
mainfrom
feat/typed-wasm-verify-c2-section-parser

Conversation

@hyperpolymath
Copy link
Copy Markdown
Owner

Summary

  • Implements parser + encoder for the affinescript.ownership custom-section wire format.
  • Faithful port of OCaml Tw_verify.parse_ownership_section_payload (including its lenient EOF behaviour, for cross-impl parity).
  • Adds OwnershipEntry, parse_ownership_section_payload, build_ownership_section_payload, plus OwnershipKind::to_byte.

Wire format

Little-endian, byte-aligned:

u32le  count
for each entry:
  u32le  func_idx
  u8     n_params
  u8[n]  param_kinds  (0=Unrestricted, 1=Linear, 2=SharedBorrow, 3=ExclBorrow)
  u8     ret_kind

Parity decisions

Behaviour OCaml This crate Why
Truncated payload Reads zeros past EOF Same C5 (cross-compat suite) needs identical results on every payload OCaml accepts
Unknown kind byte Falls back to Unrestricted Same Same — preserved from C1's OwnershipKind::from_byte
n_params overflow Truncates to u8 Panics on encode (real modules never hit this) Defensive — encoder bug, not data

Stricter validation is a future opt-in (parse_strict), not the default.

Tests

$ cargo test -p typed-wasm-verify
running 11 tests
test section::tests::build_emits_correct_wire_format ... ok
test section::tests::empty_payload_yields_no_entries ... ok
test section::tests::roundtrip_empty ... ok
test section::tests::count_zero_yields_no_entries ... ok
test section::tests::multiple_entries ... ok
test section::tests::roundtrip_realistic ... ok
test section::tests::single_entry_no_params ... ok
test section::tests::single_entry_with_all_kinds ... ok
test section::tests::truncated_payload_reads_zeros_past_end ... ok
test section::tests::unknown_kind_byte_decodes_to_unrestricted ... ok
test tests::ownership_kind_byte_roundtrip ... ok
test result: ok. 11 passed; 0 failed; 0 ignored

Stacking

Stacked on top of #19 (C1 scaffold). Merge order: #19 → this. Once #19 merges, this PR's base will retarget to main automatically.

Follow-up

  • C3 — port count_uses_range + the intra-function verifier (uses the entries decoded here)
  • C4 — port extract_exports + verify_cross_module
  • C5 — cross-compat test against affinescript-emitted modules

Closes #37 once merged.

@hyperpolymath hyperpolymath force-pushed the feat/typed-wasm-verify-rust-port branch 2 times, most recently from 2d63b52 to 1e247de Compare May 15, 2026 05:58
@hyperpolymath hyperpolymath force-pushed the feat/typed-wasm-verify-c2-section-parser branch from 079ff80 to 88fcf34 Compare May 15, 2026 06:00
hyperpolymath added a commit that referenced this pull request May 15, 2026
Ports `Tw_verify.count_uses_range` + `verify_function` + `verify_from_module`
from hyperpolymath/affinescript/lib/tw_verify.ml. After this commit the
C1 `verify_from_module` stub is gone — calling it on a wasm module that
carries an `affinescript.ownership` custom section actually enforces the
L7 (aliasing) and L10 (linearity) constraints.

The algorithm
-------------

OCaml walks an in-memory instruction tree recursively. wasmparser hands
us a flat operator stream with structured-control delimiters, so we run
the same per-path `(min, max)` algorithm with an explicit frame stack:

  Frame::Plain      — Block, Loop, or the implicit body scope
  Frame::IfThen     — `If` before any `Else` is seen
  Frame::IfElse     — `If` after `Else`; then-side totals are frozen

Frame transitions on the structured-control operators:

  Block / Loop      → push Plain
  If                → push IfThen
  Else              → top must be IfThen; transition to IfElse,
                      freezing the then-side totals
  End               → pop, collapse to (m, x), add into parent
                        Plain                 → (min,  max)
                        IfThen, no else seen  → (0,    then_max)
                        IfElse                → (min(t,e), max(t,e))
  LocalGet n        → if n matches the counter, add (1, 1) to the
                      top frame's currently-active side
  Anything else     → no-op

The function body's terminating `End` pops the bottom frame and the
collapse becomes the final result.

The frame state is split out as a private `OpCounter` trait so C4 can
reuse the exact same machinery with a `Call`-based counter for
cross-module verification.

Per-function rules (mirroring OCaml `verify_function`)
------------------------------------------------------

For each param at index `i`, compute `(min_uses, max_uses)` then apply:

  Linear:
    max == 0           → LinearNotUsed
    min == 0, max ≥ 1  → LinearDroppedOnSomePath
    max > 1            → LinearUsedMultiple { count: max }
    (both "drop" and "dup" can fire for the same param if min=0, max>1)

  ExclBorrow:
    max > 1            → ExclBorrowAliased { count: max }

  Unrestricted | SharedBorrow: no constraints

Module-level entry
------------------

`verify_from_module(wasm_bytes)`:

  1. Single wasmparser pass over the module:
     - Tally import_count for the function index space
     - Capture the `affinescript.ownership` custom section (if any)
     - Collect every `Payload::CodeSectionEntry` body in order
  2. If no ownership section: trivially `Ok(())`.
  3. Parse the section (C2's `parse_ownership_section_payload`).
  4. For each entry, translate `func_idx` (global) to a body index by
     subtracting `import_count`. Skip imports (no body) and out-of-
     range entries — matches OCaml's short-circuit behaviour.
  5. Aggregate all per-function violations into one `VerifyError::Ownership`
     vector, or return `Ok(())` if clean.

Tests
-----

29/29 unit tests pass:

  count_uses_range layer (range_in helper synthesises a 1-fn module
  via wasm-encoder and pulls the body out for direct analysis):

    no_uses                              → (0, 0)
    one_use                              → (1, 1)
    two_uses_same_path                   → (2, 2)
    use_in_both_if_branches              → (1, 1)
    use_in_then_only                     → (0, 1)
    use_twice_in_then_once_in_else       → (1, 2)
    use_inside_block_passthrough         → (1, 1)
    use_inside_loop_passthrough          → (1, 1)
    nested_if_use_in_inner_then_only     → (0, 1)

  verify_from_module end-to-end (synthetic modules with ownership
  custom sections, hitting each error variant + the clean paths):

    linear_used_exactly_once_is_clean
    linear_not_used_at_all_errors                 → LinearNotUsed
    linear_dropped_on_some_path_errors            → LinearDroppedOnSomePath
    linear_used_twice_errors                      → LinearUsedMultiple
    excl_borrow_used_twice_errors                 → ExclBorrowAliased
    excl_borrow_used_once_is_clean
    unrestricted_used_arbitrarily_is_clean
    module_without_ownership_section_is_trivially_clean
    empty_module_is_trivially_clean

  $ cargo test -p typed-wasm-verify
  running 29 tests
  ... all pass ...
  test result: ok. 29 passed; 0 failed; 0 ignored

Stacked on top of #20 (C2 section codec). Next: C4 — port the
cross-module boundary verifier (the same `(min, max)` frame stack
applied to `Call` operators against a callee's exported interface).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@hyperpolymath hyperpolymath deleted the branch main May 15, 2026 06:01
@hyperpolymath hyperpolymath reopened this May 15, 2026
@hyperpolymath hyperpolymath changed the base branch from feat/typed-wasm-verify-rust-port to main May 15, 2026 06:03
Implements the parser and encoder for the `affinescript.ownership`
custom-section wire format. The parser is a faithful port of OCaml
`Tw_verify.parse_ownership_section_payload`; the encoder mirrors the
matching emitter on the affinescript side (`Codegen.build_ownership_section`)
so downstream emitters in ephapax (C6) and elsewhere can reuse it.

Wire format
-----------

Little-endian, byte-aligned:

  u32le  count
  for each entry:
    u32le  func_idx
    u8     n_params
    u8[n]  param_kinds  (0=Unrestricted, 1=Linear, 2=SharedBorrow, 3=ExclBorrow)
    u8     ret_kind

Cross-impl parity decision
--------------------------

The OCaml parser is lenient: reading past the end of the payload yields
zero bytes (interpreted as `Unrestricted` kinds and `func_idx = 0`).
This Rust port preserves that exactly. A correctly-emitted section
will never be truncated, but matching the leniency means the C5
cross-compat suite sees identical results on every payload the OCaml
side accepts. Stricter validation is a future opt-in (`parse_strict`),
not the default.

The `OwnershipKind::from_byte` fallback (anything outside 0..=3 →
`Unrestricted`) was already in C1 and is reused here.

New API
-------

  - `OwnershipEntry { func_idx, param_kinds, ret_kind }` — named struct
    replacing the OCaml 3-tuple for readability.
  - `parse_ownership_section_payload(&[u8]) -> Vec<OwnershipEntry>` —
    lenient parser (matches OCaml).
  - `build_ownership_section_payload(&[OwnershipEntry]) -> Vec<u8>` —
    inverse encoder; panics if any entry has >255 params (the n_params
    field is u8, but real wasm modules don't hit this limit).
  - `OwnershipKind::to_byte(self) -> u8` — encode side of the existing
    `from_byte`.

Tests
-----

11/11 unit tests, covering:

  - empty payload
  - count=0 with no entries
  - single entry with zero params
  - single entry with all four kind values (encode + decode)
  - multi-entry round-trip
  - unknown kind byte falls back to `Unrestricted` (parity guard)
  - truncated payload reads zeros past EOF (parity guard)
  - empty round-trip (entries → bytes → entries)
  - realistic 2-entry round-trip
  - exact wire-format byte sequence for a known input

  $ cargo test -p typed-wasm-verify
  running 11 tests
  test section::tests::build_emits_correct_wire_format ... ok
  test section::tests::empty_payload_yields_no_entries ... ok
  test section::tests::roundtrip_empty ... ok
  test section::tests::count_zero_yields_no_entries ... ok
  test section::tests::multiple_entries ... ok
  test section::tests::roundtrip_realistic ... ok
  test section::tests::single_entry_no_params ... ok
  test section::tests::single_entry_with_all_kinds ... ok
  test section::tests::truncated_payload_reads_zeros_past_end ... ok
  test section::tests::unknown_kind_byte_decodes_to_unrestricted ... ok
  test tests::ownership_kind_byte_roundtrip ... ok
  test result: ok. 11 passed; 0 failed; 0 ignored

Stacked on top of #19 (C1 scaffold). Next: C3 — port the per-path
use-range analysis and intra-function verifier.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@hyperpolymath hyperpolymath force-pushed the feat/typed-wasm-verify-c2-section-parser branch from 88fcf34 to b598e58 Compare May 15, 2026 06:03
@hyperpolymath hyperpolymath merged commit ee1abb1 into main May 15, 2026
19 of 28 checks passed
@hyperpolymath hyperpolymath deleted the feat/typed-wasm-verify-c2-section-parser branch May 15, 2026 06:04
hyperpolymath added a commit that referenced this pull request May 15, 2026
Ports `Tw_verify.count_uses_range` + `verify_function` + `verify_from_module`
from hyperpolymath/affinescript/lib/tw_verify.ml. After this commit the
C1 `verify_from_module` stub is gone — calling it on a wasm module that
carries an `affinescript.ownership` custom section actually enforces the
L7 (aliasing) and L10 (linearity) constraints.

The algorithm
-------------

OCaml walks an in-memory instruction tree recursively. wasmparser hands
us a flat operator stream with structured-control delimiters, so we run
the same per-path `(min, max)` algorithm with an explicit frame stack:

  Frame::Plain      — Block, Loop, or the implicit body scope
  Frame::IfThen     — `If` before any `Else` is seen
  Frame::IfElse     — `If` after `Else`; then-side totals are frozen

Frame transitions on the structured-control operators:

  Block / Loop      → push Plain
  If                → push IfThen
  Else              → top must be IfThen; transition to IfElse,
                      freezing the then-side totals
  End               → pop, collapse to (m, x), add into parent
                        Plain                 → (min,  max)
                        IfThen, no else seen  → (0,    then_max)
                        IfElse                → (min(t,e), max(t,e))
  LocalGet n        → if n matches the counter, add (1, 1) to the
                      top frame's currently-active side
  Anything else     → no-op

The function body's terminating `End` pops the bottom frame and the
collapse becomes the final result.

The frame state is split out as a private `OpCounter` trait so C4 can
reuse the exact same machinery with a `Call`-based counter for
cross-module verification.

Per-function rules (mirroring OCaml `verify_function`)
------------------------------------------------------

For each param at index `i`, compute `(min_uses, max_uses)` then apply:

  Linear:
    max == 0           → LinearNotUsed
    min == 0, max ≥ 1  → LinearDroppedOnSomePath
    max > 1            → LinearUsedMultiple { count: max }
    (both "drop" and "dup" can fire for the same param if min=0, max>1)

  ExclBorrow:
    max > 1            → ExclBorrowAliased { count: max }

  Unrestricted | SharedBorrow: no constraints

Module-level entry
------------------

`verify_from_module(wasm_bytes)`:

  1. Single wasmparser pass over the module:
     - Tally import_count for the function index space
     - Capture the `affinescript.ownership` custom section (if any)
     - Collect every `Payload::CodeSectionEntry` body in order
  2. If no ownership section: trivially `Ok(())`.
  3. Parse the section (C2's `parse_ownership_section_payload`).
  4. For each entry, translate `func_idx` (global) to a body index by
     subtracting `import_count`. Skip imports (no body) and out-of-
     range entries — matches OCaml's short-circuit behaviour.
  5. Aggregate all per-function violations into one `VerifyError::Ownership`
     vector, or return `Ok(())` if clean.

Tests
-----

29/29 unit tests pass:

  count_uses_range layer (range_in helper synthesises a 1-fn module
  via wasm-encoder and pulls the body out for direct analysis):

    no_uses                              → (0, 0)
    one_use                              → (1, 1)
    two_uses_same_path                   → (2, 2)
    use_in_both_if_branches              → (1, 1)
    use_in_then_only                     → (0, 1)
    use_twice_in_then_once_in_else       → (1, 2)
    use_inside_block_passthrough         → (1, 1)
    use_inside_loop_passthrough          → (1, 1)
    nested_if_use_in_inner_then_only     → (0, 1)

  verify_from_module end-to-end (synthetic modules with ownership
  custom sections, hitting each error variant + the clean paths):

    linear_used_exactly_once_is_clean
    linear_not_used_at_all_errors                 → LinearNotUsed
    linear_dropped_on_some_path_errors            → LinearDroppedOnSomePath
    linear_used_twice_errors                      → LinearUsedMultiple
    excl_borrow_used_twice_errors                 → ExclBorrowAliased
    excl_borrow_used_once_is_clean
    unrestricted_used_arbitrarily_is_clean
    module_without_ownership_section_is_trivially_clean
    empty_module_is_trivially_clean

  $ cargo test -p typed-wasm-verify
  running 29 tests
  ... all pass ...
  test result: ok. 29 passed; 0 failed; 0 ignored

Stacked on top of #20 (C2 section codec). Next: C4 — port the
cross-module boundary verifier (the same `(min, max)` frame stack
applied to `Call` operators against a callee's exported interface).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
hyperpolymath added a commit that referenced this pull request May 15, 2026
Ports `Tw_verify.count_uses_range` + `verify_function` + `verify_from_module`
from hyperpolymath/affinescript/lib/tw_verify.ml. After this commit the
C1 `verify_from_module` stub is gone — calling it on a wasm module that
carries an `affinescript.ownership` custom section actually enforces the
L7 (aliasing) and L10 (linearity) constraints.

The algorithm
-------------

OCaml walks an in-memory instruction tree recursively. wasmparser hands
us a flat operator stream with structured-control delimiters, so we run
the same per-path `(min, max)` algorithm with an explicit frame stack:

  Frame::Plain      — Block, Loop, or the implicit body scope
  Frame::IfThen     — `If` before any `Else` is seen
  Frame::IfElse     — `If` after `Else`; then-side totals are frozen

Frame transitions on the structured-control operators:

  Block / Loop      → push Plain
  If                → push IfThen
  Else              → top must be IfThen; transition to IfElse,
                      freezing the then-side totals
  End               → pop, collapse to (m, x), add into parent
                        Plain                 → (min,  max)
                        IfThen, no else seen  → (0,    then_max)
                        IfElse                → (min(t,e), max(t,e))
  LocalGet n        → if n matches the counter, add (1, 1) to the
                      top frame's currently-active side
  Anything else     → no-op

The function body's terminating `End` pops the bottom frame and the
collapse becomes the final result.

The frame state is split out as a private `OpCounter` trait so C4 can
reuse the exact same machinery with a `Call`-based counter for
cross-module verification.

Per-function rules (mirroring OCaml `verify_function`)
------------------------------------------------------

For each param at index `i`, compute `(min_uses, max_uses)` then apply:

  Linear:
    max == 0           → LinearNotUsed
    min == 0, max ≥ 1  → LinearDroppedOnSomePath
    max > 1            → LinearUsedMultiple { count: max }
    (both "drop" and "dup" can fire for the same param if min=0, max>1)

  ExclBorrow:
    max > 1            → ExclBorrowAliased { count: max }

  Unrestricted | SharedBorrow: no constraints

Module-level entry
------------------

`verify_from_module(wasm_bytes)`:

  1. Single wasmparser pass over the module:
     - Tally import_count for the function index space
     - Capture the `affinescript.ownership` custom section (if any)
     - Collect every `Payload::CodeSectionEntry` body in order
  2. If no ownership section: trivially `Ok(())`.
  3. Parse the section (C2's `parse_ownership_section_payload`).
  4. For each entry, translate `func_idx` (global) to a body index by
     subtracting `import_count`. Skip imports (no body) and out-of-
     range entries — matches OCaml's short-circuit behaviour.
  5. Aggregate all per-function violations into one `VerifyError::Ownership`
     vector, or return `Ok(())` if clean.

Tests
-----

29/29 unit tests pass:

  count_uses_range layer (range_in helper synthesises a 1-fn module
  via wasm-encoder and pulls the body out for direct analysis):

    no_uses                              → (0, 0)
    one_use                              → (1, 1)
    two_uses_same_path                   → (2, 2)
    use_in_both_if_branches              → (1, 1)
    use_in_then_only                     → (0, 1)
    use_twice_in_then_once_in_else       → (1, 2)
    use_inside_block_passthrough         → (1, 1)
    use_inside_loop_passthrough          → (1, 1)
    nested_if_use_in_inner_then_only     → (0, 1)

  verify_from_module end-to-end (synthetic modules with ownership
  custom sections, hitting each error variant + the clean paths):

    linear_used_exactly_once_is_clean
    linear_not_used_at_all_errors                 → LinearNotUsed
    linear_dropped_on_some_path_errors            → LinearDroppedOnSomePath
    linear_used_twice_errors                      → LinearUsedMultiple
    excl_borrow_used_twice_errors                 → ExclBorrowAliased
    excl_borrow_used_once_is_clean
    unrestricted_used_arbitrarily_is_clean
    module_without_ownership_section_is_trivially_clean
    empty_module_is_trivially_clean

  $ cargo test -p typed-wasm-verify
  running 29 tests
  ... all pass ...
  test result: ok. 29 passed; 0 failed; 0 ignored

Stacked on top of #20 (C2 section codec). Next: C4 — port the
cross-module boundary verifier (the same `(min, max)` frame stack
applied to `Call` operators against a callee's exported interface).

Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant