Skip to content

extend Assembler API; add primes workload showcase#835

Open
prodigalwon wants to merge 3 commits into
jarchain:masterfrom
prodigalwon:feature/assembler-extended-ops
Open

extend Assembler API; add primes workload showcase#835
prodigalwon wants to merge 3 commits into
jarchain:masterfrom
prodigalwon:feature/assembler-extended-ops

Conversation

@prodigalwon
Copy link
Copy Markdown

Summary

Two commits, both motivated by hand-rolling non-trivial workloads against
grey-transpiler::Assembler:

  1. grey-transpiler: extend the Assembler API with 22 typed helpers
    for opcodes the existing surface didn't cover — multiplication, division,
    remainder, bitwise ALU, set-less-than, and register-register branches.
    The encoding patterns mirror what grey-bench's local emit_branch_lt_u
    helper has been doing via emit_raw — formalizing it as typed methods.
    17 byte-encoding unit tests; 65 → 82 total tests pass.

  2. grey-bench: add grey_primes_blob(n) + polkavm_primes_blob(n)
    hand-assembled naive trial-division prime counter. Showcase consumer
    for the extended API (uses rem_unsigned_64, set_less_than_unsigned,
    mul_64, branch_less_unsigned). Complements the pre-built sieve blobs;
    different algorithm (O(N²) vs O(N log log N)) and different construction
    path (hand-assembled vs build-script). 2 new tests verifying correctness
    against a native-Rust reference; 21 / 21 grey-bench tests pass.

Verification

  • cargo test -p grey-transpiler --lib — 82 pass (existing 65 + 17 new)
  • cargo test -p grey-bench --lib — 21 pass (existing 19 + 2 new)
  • cargo check -p grey-bench — clean (no downstream regressions)

prodigalwon and others added 2 commits May 10, 2026 14:55
Adds 17 new typed methods to `Assembler` covering the gap that
`grey-bench` has been working around via local `emit_branch_lt_u`
+ raw byte emission. With these helpers, hand-assembled programs
can target the full ALU + control flow surface of JAM PVM without
escaping to `emit_raw`.

Three-register ALU:
- mul_64 (202), div_unsigned_64 (203), div_signed_64 (204),
  rem_unsigned_64 (205), rem_signed_64 (206)
- and (210), xor (211), or (212)
- set_less_than_unsigned (216), set_less_than_signed (217)

Two-register-plus-immediate:
- mul_imm_64 (150)
- and_imm (132), xor_imm (133), or_imm (134)
- set_less_than_unsigned_imm (136), set_less_than_signed_imm (137)

Register-register branches (a new section in the file — these have
their own encoding shape, opcode + (rA|rB<<4) + 4-byte signed
relative offset, distinct from the existing reg-imm branches):
- branch_eq (170), branch_not_eq (171)
- branch_less_unsigned (172), branch_less_signed (173)
- branch_greater_or_equal_unsigned (174),
  branch_greater_or_equal_signed (175)

`branch_less_unsigned` in particular makes `grey-bench`'s
`emit_branch_lt_u` helper obsolete — same opcode, same encoding,
typed instead of raw.

All encodings follow the existing methods' patterns verbatim:
opcode emitted with `is_instruction_start = true`, operand bytes
with `false`, immediate via `emit_imm(value as i64, 4)`.

17 new unit tests verify the byte encoding of each method against
the documented opcode + the encoding shape. Existing tests
unchanged. `cargo test -p grey-transpiler --lib` runs 65 tests,
all pass. `cargo check -p grey-bench` confirms no downstream
breakage.

Motivation: building Rostro's RVM bench harness against javm
required hand-assembling non-trivial workloads (primes, sieve,
sort) — without these helpers we hit the same emit_raw +
hand-offset-tracking pattern grey-bench already works around.
Formalizing the helpers benefits any consumer hand-rolling JAM
PVM bytecode.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `grey_primes_blob(n)` and `polkavm_primes_blob(n)` — both
hand-assembled, both count primes in [2, N) by naive trial division,
both return the count in A0.

This is the showcase consumer for the extended Assembler API landed
in the previous commit. Demonstrates four of the new methods in a
real workload:
- `rem_unsigned_64`   — modulo for divisibility test
- `set_less_than_unsigned` — both nonzero-flag and out-of-range mask
- `mul_64`            — flag aggregation (replaces early-exit branches)
- `branch_less_unsigned` — inner + outer loop back-edges

The polkavm-flavor mirrors the same algorithm via `ProgramBlobBuilder`
(matches the existing `grey_fib_blob` / `polkavm_fib_blob` pairing
convention).

Algorithm:
  count = (2 < N) ? 1 : 0          // pre-count i=2 if in range
  for i in 3..(N+1):
      is_prime = 1
      for j in 2..i:               // do-while; i >= 3 so always runs >= 1
          rem = i %u j
          is_prime *= (0 <u rem)
      in_range = (i < N) ? 1 : 0   // masks the final overshoot iteration
      count += is_prime * in_range
  return count

The two `mul *= flag` patterns deliberately replace forward
early-exit branches with backward-only branches. Forward branches
in hand-assembled programs need their target at a basic-block start,
which means a preceding terminator instruction — solvable but
fragile. The mul-by-flag trick sidesteps it entirely while keeping
the same asymptotic O(N^2) cost.

Two unit tests in a new `tests_primes` module:
- `test_grey_primes_matches_native_reference` — sweeps
  N in {2, 5, 10, 30}, asserts blob output matches a native-Rust
  trial-division reference.
- `test_grey_primes_30_is_10` — sanity-check known fact (10 primes
  in [2, 30)).

`cargo test -p grey-bench --lib` — 21 pass (19 pre-existing + 2 new).

Complements the pre-built `grey_sieve_blob` / `polkavm_sieve_blob`
already in this file: different algorithm (O(N^2) vs O(N log log N))
and different construction path (hand-assembled vs Rust->RISC-V via
build-script). Useful as a reference workload for the extended
Assembler API and as an alternate compute-shape for benchmarking.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Genesis Review

Comparison targets:

How to review

Post a comment with the following format (rank from best to worst):

/review
difficulty: <commit1>, <commit2>, ..., <commitN>, currentPR
novelty: <commit1>, <commit2>, ..., <commitN>, currentPR
design: <commit1>, <commit2>, ..., <commitN>, currentPR
verdict: merge

Use the short commit hashes above and currentPR for this PR.
Each line ranks all comparison targets + this PR from best to worst.

To meta-review another reviewer's comment, react with 👍 or 👎.

CI's `cargo fmt --all --check` flagged the trailing
`builder.to_vec().expect(...)` chain as a single line; rustfmt
prefers it broken across three lines. No semantic change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@johandroid
Copy link
Copy Markdown
Contributor

/review
difficulty: 9720b49, currentPR, 2e03f64, 27e699f, 34d3e3e, 4018dae, 81deabd, 8c12cd7
novelty: currentPR, 9720b49, 2e03f64, 27e699f, 34d3e3e, 4018dae, 81deabd, 8c12cd7
design: 9720b49, currentPR, 27e699f, 34d3e3e, 2e03f64, 4018dae, 81deabd, 8c12cd7
verdict: merge

Adds a useful typed Assembler surface plus a hand-assembled prime workload that exercises the new helpers, with byte-level encoding tests and a native reference check. Ranked below the GRANDPA out-of-order vote fix on difficulty/design because that target handles live consensus behavior, but above the smaller proof/refactor/doc targets. Because this touches grey-bench and grey-transpiler, a benchmark comparison is worth running before final merge.

@github-actions
Copy link
Copy Markdown
Contributor

JAR Bot: Review recorded from @johandroid (1 reviews, 0 meta-reviews).
Merge weight: 0/37665 (need >50%).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants