wallet: concurrent HTTP + parallel decrypt for cmd_rollup_sync#25
Merged
Merged
Conversation
447bcce to
c68a92d
Compare
The 67k-commit ushuaianet sync wallclock was dominated by sequential ureq round trips against the rollup-node's durable-state RPC (~40 ms each, ~95% of total). Three orthogonal accelerations land here, all firing on the post-trilitech#24 cooperative-yield path that `cmd_rollup_sync` actually executes: * 2.A — concurrent HTTP fetch in `RollupRpc::load_notes_since_at_block` via `reqwest` + `futures_util::FuturesUnordered`. The async runtime is hosted on a dedicated worker thread so the path composes with both the synchronous CLI dispatch and the multi-thread tokio runtime that powers `tzel-detect`'s axum handlers (calling `block_on` from inside a tokio runtime panics — the worker-thread bridge sidesteps that without forcing every caller to be async). Concurrency tuned by `TZEL_SYNC_CONCURRENCY` (default 4 — matches CI vCPU count, ship-safe across every rollup-node we know about; bumping above 4 should be opt-in once an operator measures their own rollup-node's tail-latency curve via `services/scan-bench/`). The 128 hard cap rejects misconfiguration before it can fill the rollup node's TCP backlog. Errors short-circuit the batch — same abort-on-first-error contract the sequential loop had. * 2.B — `apply_scan_feed_recover_batch` (PR trilitech#24's per-batch recover function, the function `cmd_rollup_sync` actually calls) routes its ML-KEM-768 trial-decrypt loop through rayon's default global pool. The decrypt is embarrassingly parallel: `try_recover_note` is `&self`, addresses don't change inside the call, recovered notes are independent. Recovery results are merged sequentially after the parallel pass to preserve the existing println-then-push order the test suite relies on. (The legacy `apply_scan_feed` on the dead `cmd_scan` path was already parallelised in an earlier draft of this PR, but the post-trilitech#24 hot path runs through `apply_scan_feed_recover_batch` — without this commit, the rayon par_iter never fired on a real `tzel-wallet sync` invocation.) * 2.A continued — `SyncFetcher` long-lived context. Pre-fix: `fetch_published_notes_concurrent` was called per-batch and rebuilt the `reqwest::Client` (and a fresh tokio runtime + thread) every time. At the new K=250 across a 67k-commit sync that's still ~268 client builds, each warming a fresh connection pool from cold — the design doc's "amortised TCP+TLS handshake" claim was false in that shape. `SyncFetcher` owns one `reqwest::Client` + worker thread + tokio runtime that survive the whole `cmd_rollup_sync` call. Pool warms once. (Across `--watch` iterations the pool is rebuilt; the per-iteration scan is small once caught up.) A one-line stderr banner at sync start surfaces both tunables to the operator. Composes orthogonally with `--at-tree-size` (PR trilitech#23, merged) and the cooperative-yield work in PR trilitech#24, including PR trilitech#24 body line 90's "re-tune K to 250–500 once P2 lands" — `DEFAULT_CHECKPOINT_EVERY` bumps from 50 → 250 (conservative end of that range). NOT included: 2.B (batched RPC, deferred to upstream octez) and 2.D (monitor stream, a separate concern). Tests: 4 in `network_profile_tests` (122 lib tests pass total, was 117 on `main` pre-trilitech#24-chain): * concurrent_fetch_returns_same_results_as_sequential — out-of-order completion is reassembled correctly (N=1 vs N=8 against a 100-commit mock). * concurrent_fetch_aborts_on_5xx — a 503 mid-batch propagates the error with HTTP status preserved, no silent retry, AND a counted mock asserts cancellation short-circuited (served < total). * parallel_decrypt_returns_same_results_as_sequential — independent fixture-time oracle (50 mixed recoverable + non-recoverable notes); both sequential and parallel branches must produce exactly the oracle set. Mutating `apply_scan_feed_recover_batch` to drop half the recoveries fails this test loud (verified). * sync_fetcher_amortises_client_across_batches — pins the contract that one `reqwest::Client` services multiple sequential batches. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
c68a92d to
2461c33
Compare
saroupille
pushed a commit
to saroupille/tzel
that referenced
this pull request
May 10, 2026
The combined `rust-and-cairo` job started clipping cold runs once the wallet test suite grew (PR trilitech#25 added the concurrent-fetch + parallel-decrypt machinery; the resulting +16s on `tzel-wallet-app --lib` pushed total wallclock past ~9 min and hit a runner shutdown signal mid-Cairo on the first post-merge main run, SHA 7e48045). Two changes: 1. Split `rust-and-cairo` into `rust-tests` and `cairo-tests` parallel jobs. Each runs on its own ubuntu-latest runner with its own wallclock budget. Cairo doesn't gate on Rust completion any more; failures surface independently. 2. Add `Swatinem/rust-cache@v2.7.8` (SHA-pinned per the workflow's third-party-action convention) to both Rust-touching jobs (`rust-tests` and `ocaml`'s cross-impl interop step). Cargo registry + target/ amortise across runs; cache key bumps on `Cargo.lock` change. Warm runs go from ~9 min cold to ~3-4 min. The `ocaml` job is unchanged structurally — just gains the cargo cache for its Rust cross-impl test. No code changes; workflow-only.
saroupille
added a commit
that referenced
this pull request
May 10, 2026
The combined `rust-and-cairo` job started clipping cold runs once the wallet test suite grew (PR #25 added the concurrent-fetch + parallel-decrypt machinery; the resulting +16s on `tzel-wallet-app --lib` pushed total wallclock past ~9 min and hit a runner shutdown signal mid-Cairo on the first post-merge main run, SHA 7e48045). Two changes: 1. Split `rust-and-cairo` into `rust-tests` and `cairo-tests` parallel jobs. Each runs on its own ubuntu-latest runner with its own wallclock budget. Cairo doesn't gate on Rust completion any more; failures surface independently. 2. Add `Swatinem/rust-cache@v2.7.8` (SHA-pinned per the workflow's third-party-action convention) to both Rust-touching jobs (`rust-tests` and `ocaml`'s cross-impl interop step). Cargo registry + target/ amortise across runs; cache key bumps on `Cargo.lock` change. Warm runs go from ~9 min cold to ~3-4 min. The `ocaml` job is unchanged structurally — just gains the cargo cache for its Rust cross-impl test. No code changes; workflow-only. Co-authored-by: François Thiré <franth2@gmail.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The 67k-commit ushuaianet sync wallclock is dominated by sequential
ureqround trips against the rollup-node's durable-state RPC (~40 ms each,
~95 % of total). This PR ships three orthogonal accelerations from the
sync-acceleration design (2.A concurrent HTTP, 2.B/2.C parallel ML-KEM-768
trial-decrypt) and bumps
DEFAULT_CHECKPOINT_EVERYper PR #24's deferredretune. The composition fires on the post-#24 cooperative-yield path that
cmd_rollup_syncactually executes — earlier drafts of this PR addedrayon to the legacy
apply_scan_feed(only reachable from the deadcmd_scanentrypoint), which meant the decrypt parallelism never firedon a real
tzel-wallet syncinvocation. Reviewer caught that; fixed.This PR body deliberately drops the speedup numbers it previously claimed
("45 min → 5-6 min", "5-7× HTTP", "4-6× decrypt"). Those were estimates
without measurements behind them; the bench harness in
tzel-infra PR #57
now produces real numbers, and the design doc's §3.3 will be backfilled
when those land. Honest claims: each of the three layers does measurably
fire on the sync path (verified by adversarial mutation), and the
post-fix composition is the shape the design doc intended from the start.
2.A — concurrent HTTP fetch
RollupRpc::load_notes_since_at_blockpreviously issued one synchronousureq::getper note. It now drivesreqwest+futures_util::FuturesUnorderedwith a tunable concurrency budget. Out-of-order completions are reassembled
by index; errors short-circuit the batch (same abort-on-first-error
contract the sequential loop had — verified by a counted mock that
asserts cancellation actually short-circuits).
The async runtime is hosted on a dedicated worker thread so the path
composes with both the synchronous CLI dispatch and the multi-thread
tokio runtime that powers
tzel-detect's axum handlers — callingblock_onfrom inside a tokio runtime panics, the worker-thread bridgesidesteps that without forcing every caller to be async.
SyncFetcherlong-lived context:cmd_rollup_syncbuilds onereqwest::Client+ worker thread + tokio runtime and reuses them acrossevery per-K-commit batch in the scan. That pool warms exactly once
instead of once per batch (~268 batches at K=250 across 67k commits).
Across
--watchiterations the pool is rebuilt; the per-iteration scanis small once the wallet has caught up, so cross-iteration amortisation
is on the follow-up list rather than in this PR.
2.B / 2.C — parallel decrypt on the post-#24 hot path
apply_scan_feed_recover_batch(PR #24's per-batch recover function,the function
cmd_rollup_syncactually calls) routes its trial-decryptloop through rayon's default global pool. The decrypt is embarrassingly
parallel:
try_recover_notetakes&self, addresses don't changeinside the call, recovered notes are independent. Recovery results are
merged sequentially after the parallel pass to preserve the existing
println!-then-push order the test suite relies on. Adversarial check:running the test suite with
RAYON_NUM_THREADS=1adds ~20 s ofwallclock vs the default pool, proving rayon now fires on the sync
path.
Tunables
TZEL_SYNC_CONCURRENCY(default 4, hard cap 128). 4 matchestypical CI vCPU count and stays comfortably below any stock
rollup-node's connection tolerance — chosen without cross-operator
measurement, on purpose conservative. Higher concurrency (8, 16, 32, …)
is reasonable on tuned operator boxes but should be opt-in once an
operator measures their own rollup-node's tail-latency curve via the
bench harness in tzel-infra PR #57. The 128 hard cap rejects
misconfiguration before it can fill the rollup-node TCP backlog.
--checkpoint-every N/TZEL_SYNC_CHECKPOINT_EVERY/DEFAULT_CHECKPOINT_EVERY— bumped from 50 → 250 perPR #24's body line 90
("Re-tune default K to 250–500 once the design doc's parallel-fetch
patch (P2, see PR wallet: concurrent HTTP + parallel decrypt for cmd_rollup_sync #25) lands"). 250 is the conservative end of that
range — drops the per-checkpoint fsync from ~7/s to ~1.4/s sustained
at the post-PR throughput. Loss-on-interrupt grows from ~2 s to ~10 s,
still well under the user-visible threshold for cooperative-yield UX.
operator.
Composition with related PRs
--at-tree-size(#23,merged on
main) — operators that start fresh from an explicitcursor still benefit from the concurrent fetch on subsequent syncs.
#24 (merged): each
K-commit batch from wallet: cooperative-yield sync (incremental persist + sentinel) #24 is fetched concurrently within the batch and
trial-decrypted in parallel by this PR. PR wallet: cooperative-yield sync (incremental persist + sentinel) #24's
apply_scan_feed_recover_batchis exactly the function that gainedthe rayon
par_iterhere.tzel-infra PR #57
(separate). The harness wraps
tzel-wallet syncin a subprocess andtimes it end-to-end; the README walks through the sandbox-first
recipe (
services/wallet/scripts/install-sandbox.sh→ bench).NOT included (deferred)
node doesn't currently expose a batch durable-read endpoint).
--watchconnection-pool reuse.--watchrebuildsSyncFetcherper iteration; per-iteration scans are small enoughthat one fresh handshake per poll is bounded.
Tests
4 tests in
network_profile_tests:concurrent_fetch_returns_same_results_as_sequential— drives a100-commit mock rollup, scans with
TZEL_SYNC_CONCURRENCY=1and=8, asserts identicalNoteMemos in identical order. Validatesthat out-of-order completion is reassembled correctly.
concurrent_fetch_aborts_on_5xx— mock returns 503 on the index-7length probe; sync at concurrency=4 against a 50-note batch. Asserts
the error propagates with HTTP status preserved, attributed to a
rollup RPCURL, AND that the number of requests served by thecounted mock is strictly less than the index count — a "drain
everything before propagating" refactor would still satisfy the old
assertion but would fail this strengthened one.
parallel_decrypt_returns_same_results_as_sequential— builds amixed fixture (recoverable + non-recoverable notes) with the
recoverability of each index pre-computed at fixture-construction
time as an independent oracle. Both sequential (one-thread rayon
pool) and parallel (default global pool) branches must produce
exactly the oracle set, AND must match each other. Verified
adversarially: mutating
apply_scan_feed_recover_batchto drop halfthe recoveries (
take(feed.notes.len() / 2)) makes this test failloud — replacing the previous tautological shape that compared
apply_scan_feedagainst itself.sync_fetcher_amortises_client_across_batches— pins the contractthat one
reqwest::Clientservices multiple sequential batches viaSyncFetcher. Counts served HTTP requests across two back-to-backfetches and asserts both batches see the same warm pool.
122 lib tests pass total (was 117 on
mainbefore the #24/#25 chainlanded; #24 added the cooperative-yield coverage, this PR adds the four
sync-acceleration tests above).
Test plan
cargo +nightly-2025-07-14 build -p tzel-wallet-appcargo +nightly-2025-07-14 test -p tzel-wallet-app --lib— 122 okapply_scan_feed_recover_batch) —parallel_decrypt_returns_same_results_as_sequentialfails loud. Reverted.
RAYON_NUM_THREADS=1) — full lib-test runtimegrows from ~50 s → ~69 s, proving rayon fires on the sync path.
cargo +nightly-2025-07-14 clippy -p tzel-wallet-app --all-targets -- -D warnings— no regression vs pre-PR state (37 pre-existingtzel-core errors unchanged; nothing new from this PR).
Review-thread audit (pre-merge)
Reviewers A and B caught four blockers + several should-fix items on the
first pass; this rebase addresses each:
parallel_decrypt_returns_same_results_as_sequentialwastautological (sequential vs parallel both routed through the same
function). Rewritten with an independent fixture-time oracle.
apply_scan_feed,not in
apply_scan_feed_recover_batch(the post-wallet: cooperative-yield sync (incremental persist + sentinel) #24 hot path).Added rayon to the latter.
reqwest::Clientwas rebuilt per batch (~268 cold-starthandshakes per 67k-commit sync at K=250). Added
SyncFetcherso theclient survives the whole
cmd_rollup_synccall.tzel-wallet rollup-sync --rollup-node-url(wrong subcommand, wrong flag). Fixed intzel-infra PR #57
to run
profile init-shadownetfirst thensync.justifying anything higher.
cmd_rollup_synccall, not cross---watch.concurrent_fetch_aborts_on_5xxstrengthened with aserved-requests counter so a "drain everything before propagating"
refactor would fail it.
🤖 Generated with Claude Code