wallet: concurrent HTTP + parallel decrypt for cmd_rollup_sync by saroupille · Pull Request #25 · trilitech/tzel

saroupille · 2026-05-10T10:45:31Z

Summary

The 67k-commit ushuaianet sync wallclock is dominated by sequential ureq
round trips against the rollup-node's durable-state RPC (~40 ms each,
~95 % of total). This PR ships three orthogonal accelerations from the
sync-acceleration design (2.A concurrent HTTP, 2.B/2.C parallel ML-KEM-768
trial-decrypt) and bumps DEFAULT_CHECKPOINT_EVERY per PR #24's deferred
retune. The composition fires on the post-#24 cooperative-yield path that
cmd_rollup_sync actually executes — earlier drafts of this PR added
rayon to the legacy apply_scan_feed (only reachable from the dead
cmd_scan entrypoint), which meant the decrypt parallelism never fired
on a real tzel-wallet sync invocation. Reviewer caught that; fixed.

This PR body deliberately drops the speedup numbers it previously claimed
("45 min → 5-6 min", "5-7× HTTP", "4-6× decrypt"). Those were estimates
without measurements behind them; the bench harness in
tzel-infra PR #57
now produces real numbers, and the design doc's §3.3 will be backfilled
when those land. Honest claims: each of the three layers does measurably
fire on the sync path (verified by adversarial mutation), and the
post-fix composition is the shape the design doc intended from the start.

2.A — concurrent HTTP fetch

RollupRpc::load_notes_since_at_block previously issued one synchronous
ureq::get per note. It now drives reqwest + futures_util::FuturesUnordered
with a tunable concurrency budget. Out-of-order completions are reassembled
by index; errors short-circuit the batch (same abort-on-first-error
contract the sequential loop had — verified by a counted mock that
asserts cancellation actually short-circuits).

The async runtime is hosted on a dedicated worker thread so the path
composes with both the synchronous CLI dispatch and the multi-thread
tokio runtime that powers tzel-detect's axum handlers — calling
block_on from inside a tokio runtime panics, the worker-thread bridge
sidesteps that without forcing every caller to be async.

SyncFetcher long-lived context: cmd_rollup_sync builds one
reqwest::Client + worker thread + tokio runtime and reuses them across
every per-K-commit batch in the scan. That pool warms exactly once
instead of once per batch (~268 batches at K=250 across 67k commits).
Across --watch iterations the pool is rebuilt; the per-iteration scan
is small once the wallet has caught up, so cross-iteration amortisation
is on the follow-up list rather than in this PR.

2.B / 2.C — parallel decrypt on the post-#24 hot path

apply_scan_feed_recover_batch (PR #24's per-batch recover function,
the function cmd_rollup_sync actually calls) routes its trial-decrypt
loop through rayon's default global pool. The decrypt is embarrassingly
parallel: try_recover_note takes &self, addresses don't change
inside the call, recovered notes are independent. Recovery results are
merged sequentially after the parallel pass to preserve the existing
println!-then-push order the test suite relies on. Adversarial check:
running the test suite with RAYON_NUM_THREADS=1 adds ~20 s of
wallclock vs the default pool, proving rayon now fires on the sync
path.

Tunables

TZEL_SYNC_CONCURRENCY (default 4, hard cap 128). 4 matches
typical CI vCPU count and stays comfortably below any stock
rollup-node's connection tolerance — chosen without cross-operator
measurement, on purpose conservative. Higher concurrency (8, 16, 32, …)
is reasonable on tuned operator boxes but should be opt-in once an
operator measures their own rollup-node's tail-latency curve via the
bench harness in tzel-infra PR #57. The 128 hard cap rejects
misconfiguration before it can fill the rollup-node TCP backlog.
--checkpoint-every N / TZEL_SYNC_CHECKPOINT_EVERY /
DEFAULT_CHECKPOINT_EVERY — bumped from 50 → 250 per
PR #24's body line 90
("Re-tune default K to 250–500 once the design doc's parallel-fetch
patch (P2, see PR wallet: concurrent HTTP + parallel decrypt for cmd_rollup_sync #25) lands"). 250 is the conservative end of that
range — drops the per-checkpoint fsync from ~7/s to ~1.4/s sustained
at the post-PR throughput. Loss-on-interrupt grows from ~2 s to ~10 s,
still well under the user-visible threshold for cooperative-yield UX.
A one-line stderr banner at sync start surfaces both tunables to the
operator.

Composition with related PRs

Builds on --at-tree-size (#23,
merged on main) — operators that start fresh from an explicit
cursor still benefit from the concurrent fetch on subsequent syncs.
Composes orthogonally with the cooperative-yield work in
#24 (merged): each
K-commit batch from wallet: cooperative-yield sync (incremental persist + sentinel) #24 is fetched concurrently within the batch and
trial-decrypted in parallel by this PR. PR wallet: cooperative-yield sync (incremental persist + sentinel) #24's
apply_scan_feed_recover_batch is exactly the function that gained
the rayon par_iter here.
Bench harness + design doc live in
tzel-infra PR #57
(separate). The harness wraps tzel-wallet sync in a subprocess and
times it end-to-end; the README walks through the sandbox-first
recipe (services/wallet/scripts/install-sandbox.sh → bench).

NOT included (deferred)

2.B (true batched RPC) — deferred to upstream octez (the rollup
node doesn't currently expose a batch durable-read endpoint).
2.D — monitor stream. Separate concern.
Cross---watch connection-pool reuse. --watch rebuilds
SyncFetcher per iteration; per-iteration scans are small enough
that one fresh handshake per poll is bounded.

Tests

4 tests in network_profile_tests:

concurrent_fetch_returns_same_results_as_sequential — drives a
100-commit mock rollup, scans with TZEL_SYNC_CONCURRENCY=1 and
=8, asserts identical NoteMemos in identical order. Validates
that out-of-order completion is reassembled correctly.
concurrent_fetch_aborts_on_5xx — mock returns 503 on the index-7
length probe; sync at concurrency=4 against a 50-note batch. Asserts
the error propagates with HTTP status preserved, attributed to a
rollup RPC URL, AND that the number of requests served by the
counted mock is strictly less than the index count — a "drain
everything before propagating" refactor would still satisfy the old
assertion but would fail this strengthened one.
parallel_decrypt_returns_same_results_as_sequential — builds a
mixed fixture (recoverable + non-recoverable notes) with the
recoverability of each index pre-computed at fixture-construction
time as an independent oracle. Both sequential (one-thread rayon
pool) and parallel (default global pool) branches must produce
exactly the oracle set, AND must match each other. Verified
adversarially: mutating apply_scan_feed_recover_batch to drop half
the recoveries (take(feed.notes.len() / 2)) makes this test fail
loud — replacing the previous tautological shape that compared
apply_scan_feed against itself.
sync_fetcher_amortises_client_across_batches — pins the contract
that one reqwest::Client services multiple sequential batches via
SyncFetcher. Counts served HTTP requests across two back-to-back
fetches and asserts both batches see the same warm pool.

122 lib tests pass total (was 117 on main before the #24/#25 chain
landed; #24 added the cooperative-yield coverage, this PR adds the four
sync-acceleration tests above).

Test plan

cargo +nightly-2025-07-14 build -p tzel-wallet-app
cargo +nightly-2025-07-14 test -p tzel-wallet-app --lib — 122 ok
Adversarial B1 (drop half recoveries in
apply_scan_feed_recover_batch) — parallel_decrypt_returns_same_results_as_sequential
fails loud. Reverted.
Adversarial B2 (RAYON_NUM_THREADS=1) — full lib-test runtime
grows from ~50 s → ~69 s, proving rayon fires on the sync path.
cargo +nightly-2025-07-14 clippy -p tzel-wallet-app --all-targets -- -D warnings — no regression vs pre-PR state (37 pre-existing
tzel-core errors unchanged; nothing new from this PR).

Review-thread audit (pre-merge)

Reviewers A and B caught four blockers + several should-fix items on the
first pass; this rebase addresses each:

B1 — parallel_decrypt_returns_same_results_as_sequential was
tautological (sequential vs parallel both routed through the same
function). Rewritten with an independent fixture-time oracle.
B2 — Rayon parallelism only fired in the legacy apply_scan_feed,
not in apply_scan_feed_recover_batch (the post-wallet: cooperative-yield sync (incremental persist + sentinel) #24 hot path).
Added rayon to the latter.
B3 — reqwest::Client was rebuilt per batch (~268 cold-start
handshakes per 67k-commit sync at K=250). Added SyncFetcher so the
client survives the whole cmd_rollup_sync call.
B4 — Bench harness called tzel-wallet rollup-sync --rollup-node-url (wrong subcommand, wrong flag). Fixed in
tzel-infra PR #57
to run profile init-shadownet first then sync.
S1 — K bumped 50 → 250 per PR wallet: cooperative-yield sync (incremental persist + sentinel) #24 body line 90.
S2 — Test count corrected (122, was 117 pre-wallet: cooperative-yield sync (incremental persist + sentinel) #24).
S3 — Default concurrency dropped 8 → 4 absent measurement
justifying anything higher.
S4 — Design doc updated to reflect that pool reuse is per-
cmd_rollup_sync call, not cross---watch.
S5 — concurrent_fetch_aborts_on_5xx strengthened with a
served-requests counter so a "drain everything before propagating"
refactor would fail it.

🤖 Generated with Claude Code

The 67k-commit ushuaianet sync wallclock was dominated by sequential ureq round trips against the rollup-node's durable-state RPC (~40 ms each, ~95% of total). Three orthogonal accelerations land here, all firing on the post-trilitech#24 cooperative-yield path that `cmd_rollup_sync` actually executes: * 2.A — concurrent HTTP fetch in `RollupRpc::load_notes_since_at_block` via `reqwest` + `futures_util::FuturesUnordered`. The async runtime is hosted on a dedicated worker thread so the path composes with both the synchronous CLI dispatch and the multi-thread tokio runtime that powers `tzel-detect`'s axum handlers (calling `block_on` from inside a tokio runtime panics — the worker-thread bridge sidesteps that without forcing every caller to be async). Concurrency tuned by `TZEL_SYNC_CONCURRENCY` (default 4 — matches CI vCPU count, ship-safe across every rollup-node we know about; bumping above 4 should be opt-in once an operator measures their own rollup-node's tail-latency curve via `services/scan-bench/`). The 128 hard cap rejects misconfiguration before it can fill the rollup node's TCP backlog. Errors short-circuit the batch — same abort-on-first-error contract the sequential loop had. * 2.B — `apply_scan_feed_recover_batch` (PR trilitech#24's per-batch recover function, the function `cmd_rollup_sync` actually calls) routes its ML-KEM-768 trial-decrypt loop through rayon's default global pool. The decrypt is embarrassingly parallel: `try_recover_note` is `&self`, addresses don't change inside the call, recovered notes are independent. Recovery results are merged sequentially after the parallel pass to preserve the existing println-then-push order the test suite relies on. (The legacy `apply_scan_feed` on the dead `cmd_scan` path was already parallelised in an earlier draft of this PR, but the post-trilitech#24 hot path runs through `apply_scan_feed_recover_batch` — without this commit, the rayon par_iter never fired on a real `tzel-wallet sync` invocation.) * 2.A continued — `SyncFetcher` long-lived context. Pre-fix: `fetch_published_notes_concurrent` was called per-batch and rebuilt the `reqwest::Client` (and a fresh tokio runtime + thread) every time. At the new K=250 across a 67k-commit sync that's still ~268 client builds, each warming a fresh connection pool from cold — the design doc's "amortised TCP+TLS handshake" claim was false in that shape. `SyncFetcher` owns one `reqwest::Client` + worker thread + tokio runtime that survive the whole `cmd_rollup_sync` call. Pool warms once. (Across `--watch` iterations the pool is rebuilt; the per-iteration scan is small once caught up.) A one-line stderr banner at sync start surfaces both tunables to the operator. Composes orthogonally with `--at-tree-size` (PR trilitech#23, merged) and the cooperative-yield work in PR trilitech#24, including PR trilitech#24 body line 90's "re-tune K to 250–500 once P2 lands" — `DEFAULT_CHECKPOINT_EVERY` bumps from 50 → 250 (conservative end of that range). NOT included: 2.B (batched RPC, deferred to upstream octez) and 2.D (monitor stream, a separate concern). Tests: 4 in `network_profile_tests` (122 lib tests pass total, was 117 on `main` pre-trilitech#24-chain): * concurrent_fetch_returns_same_results_as_sequential — out-of-order completion is reassembled correctly (N=1 vs N=8 against a 100-commit mock). * concurrent_fetch_aborts_on_5xx — a 503 mid-batch propagates the error with HTTP status preserved, no silent retry, AND a counted mock asserts cancellation short-circuited (served < total). * parallel_decrypt_returns_same_results_as_sequential — independent fixture-time oracle (50 mixed recoverable + non-recoverable notes); both sequential and parallel branches must produce exactly the oracle set. Mutating `apply_scan_feed_recover_batch` to drop half the recoveries fails this test loud (verified). * sync_fetcher_amortises_client_across_batches — pins the contract that one `reqwest::Client` services multiple sequential batches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The combined `rust-and-cairo` job started clipping cold runs once the wallet test suite grew (PR trilitech#25 added the concurrent-fetch + parallel-decrypt machinery; the resulting +16s on `tzel-wallet-app --lib` pushed total wallclock past ~9 min and hit a runner shutdown signal mid-Cairo on the first post-merge main run, SHA 7e48045). Two changes: 1. Split `rust-and-cairo` into `rust-tests` and `cairo-tests` parallel jobs. Each runs on its own ubuntu-latest runner with its own wallclock budget. Cairo doesn't gate on Rust completion any more; failures surface independently. 2. Add `Swatinem/rust-cache@v2.7.8` (SHA-pinned per the workflow's third-party-action convention) to both Rust-touching jobs (`rust-tests` and `ocaml`'s cross-impl interop step). Cargo registry + target/ amortise across runs; cache key bumps on `Cargo.lock` change. Warm runs go from ~9 min cold to ~3-4 min. The `ocaml` job is unchanged structurally — just gains the cargo cache for its Rust cross-impl test. No code changes; workflow-only.

The combined `rust-and-cairo` job started clipping cold runs once the wallet test suite grew (PR #25 added the concurrent-fetch + parallel-decrypt machinery; the resulting +16s on `tzel-wallet-app --lib` pushed total wallclock past ~9 min and hit a runner shutdown signal mid-Cairo on the first post-merge main run, SHA 7e48045). Two changes: 1. Split `rust-and-cairo` into `rust-tests` and `cairo-tests` parallel jobs. Each runs on its own ubuntu-latest runner with its own wallclock budget. Cairo doesn't gate on Rust completion any more; failures surface independently. 2. Add `Swatinem/rust-cache@v2.7.8` (SHA-pinned per the workflow's third-party-action convention) to both Rust-touching jobs (`rust-tests` and `ocaml`'s cross-impl interop step). Cargo registry + target/ amortise across runs; cache key bumps on `Cargo.lock` change. Warm runs go from ~9 min cold to ~3-4 min. The `ocaml` job is unchanged structurally — just gains the cargo cache for its Rust cross-impl test. No code changes; workflow-only. Co-authored-by: François Thiré <franth2@gmail.com>

saroupille mentioned this pull request May 10, 2026

wallet: cooperative-yield sync (incremental persist + sentinel) #24

Merged

saroupille force-pushed the feat/sync-acceleration branch 2 times, most recently from 447bcce to c68a92d Compare May 10, 2026 15:30

saroupille force-pushed the feat/sync-acceleration branch from c68a92d to 2461c33 Compare May 10, 2026 15:56

saroupille merged commit 7e48045 into trilitech:main May 10, 2026
2 checks passed

saroupille mentioned this pull request May 10, 2026

ci(unit-tests): split rust-and-cairo + add Swatinem/rust-cache #27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wallet: concurrent HTTP + parallel decrypt for cmd_rollup_sync#25

wallet: concurrent HTTP + parallel decrypt for cmd_rollup_sync#25
saroupille merged 1 commit into
trilitech:mainfrom
saroupille:feat/sync-acceleration

saroupille commented May 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

saroupille commented May 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

2.A — concurrent HTTP fetch

2.B / 2.C — parallel decrypt on the post-#24 hot path

Tunables

Composition with related PRs

NOT included (deferred)

Tests

Test plan

Review-thread audit (pre-merge)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

saroupille commented May 10, 2026 •

edited

Loading