perf(pm): TTY-gate progress + cooperative yield in spawn loop by elrrrrrrr · Pull Request #2915 · utooland/utoo

elrrrrrrr · 2026-05-08T12:17:16Z

What

Smaller alternative to #2911. Two install hot-path changes layered on top of the new (post-#2905) zero-copy + par_chunks(64) baseline:

TTY-gate progress bar (carried from experiment/install-tty-gate parent commit): non-TTY callers skip indicatif's internal mutex, removing ~9k atomic ops per install.
Single consume_budget().await at the top of install_packages spawn loop: gives the tokio runtime a per-iteration window to drain socket reads on in-flight tarball downloads, replacing the implicit yield that the indicatif mutex used to provide.

for (path, package) in packages.iter() {
    tokio::task::consume_budget().await;  // ← only addition
    // ... existing logic, unchanged ...
}

Why this not the bigger #2911 partition

#2911's classify + 3-phase pipeline did fix the TCP-level starvation (zwin 14 → 0), but N=4 phases bench measured a consistent +1.05s p3 regression. Mechanism: the partition design pushed all cheap paths (omit / cpu-incompat / file: link) before any spawns, then opened Phase 3 with 64 in-flight downloads simultaneously. That concentrated disk burst widened p3 σ vs the baseline's interleaved cheap+heavy schedule.

This PR keeps the original interleaved schedule (cheap paths inline with spawns) and only adds the runtime-yield hint. Same TCP fix, smaller change, no concentrated burst.

Pcap mechanism evidence (carried over from partition experiments)

Variant	utoo zwin	utoo-next zwin
TTY-gate alone (no yield)	14-16	0
TTY-gate + cooperative yield (this PR)	0	0

consume_budget was added in tokio 1.41 and is the right primitive for this case: ~5ns when it doesn't yield (the common case), ~100ns when the per-task tick budget exhausts. utoo pins tokio 1.51.

Companion / supersedes

Supersedes perf(pm): partition install_packages into classify + 3-phase pipeline #2911 (partition design) — same TCP fix at smaller surface area, without the disk-burst-induced p3 regression.
Supersedes perf(pm): TTY-gate progress bar on install hot-path #2903 (TTY-gate alone) — keeps TTY-gate, fixes its known +1.75s p3 regression with one line.

🤖 Generated with Claude Code

`install.rs` had 8 raw `PROGRESS_BAR.inc(1)` plus a `set_length` that bypassed the `IS_TTY` short-circuit already present in `ProgressReceiver`. indicatif always takes the internal `Mutex<ProgressState>` write-lock even with a hidden draw target, so non-TTY runs (CI, piped output) were paying ~9k Mutex acquisitions per install. Wrapped them in `progress_inc` / `progress_set_length` helpers that match the receiver's gating pattern. This is 1/3 of the triplet from #2902, split out for independent A/B benchmarking against the recently-merged baseline #2887. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The pcap-only bench was previously a one-off that captured `p1_resolve` across `utoo` and `bun`, and assumed the project tree was already cloned by `pm-bench-phases.sh` running in the same job. That gave us metadata fan-out, but install-phase regressions (#2902 / #2903 / #2904 / #2905 σ widening on `p0_full_cold`) live in the tarball download path, not in resolve. This commit makes the pcap bench self-contained and covers both phases for three PMs: - Self-clone the project if `$PROJECT_DIR` is missing (mirrors `pm-bench-phases.sh`), so this script runs as a standalone CI job. - Add a `<pm>-install` capture per PM: lock pre-existing, `cache + node_modules` wiped, then `<pm> install`. This is the cold-tarball-download phase where the σ-widening lives. - Add `utoo-next` as a third PM: built upstream by `build-linux`'s bench-baseline step (now also gated on `pm-bench-pcap`), downloaded via the same artifact path as `bench-phases-linux`. Skipped in local runs where `$UTOO_NEXT_BIN` is unset. Workflow change: - `pm-bench-pcap-linux` now downloads the `utoo-next-linux-x64` artifact and exports `UTOO_NEXT_BIN` exactly like `bench-phases-linux` does. - `Build next branch utoo` and `Upload utoo-next binary` steps in `build-linux` now also fire for `inputs.target == 'pm-bench-pcap'`, not only `pm-bench-phases`. Outputs in `/tmp/pm-bench-pcap`: dns.txt utoo-{resolve,install}.{pcap,log} utoo-next-{resolve,install}.{pcap,log} (when UTOO_NEXT_BIN set) bun-{resolve,install}.{pcap,log} Drives the analysis of whether the install hot-path's increased concurrency (FuturesUnordered streaming, zero-copy tar, TTY-gate) saturates outbound TCP and starves the download path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Building on the install-phase pcap capture from the previous commit, post-process each .pcap with tshark to extract pre-TLS metrics that directly probe the "install greediness starves download" hypothesis without needing TLS session-key dumping: zero_windows — receive buffer full → server paused. Direct evidence that the app's tokio runtime is not draining the socket fast enough between extracts. retransmits — server resent because ACK was late. Indirect evidence of receive-side stall. duplicate_acks — receiver re-sent ACK because it perceived a gap. stream_gap_* — inter-packet gap distribution per TCP stream (p50 / p99 / max in microseconds). p99 / max measure the longest pause an active connection experienced — if utoo shows multi-hundred-ms gaps where utoo-next shows tens of ms, install is freezing the runtime mid-download. Per-capture summaries land at $PCAP_DIR/<name>.summary.json. They are aggregated into a top-level summary.json via jq -s, so artifact consumers can compare metrics across PMs without re-parsing the 100s of MB of raw pcaps. Single-pass tshark over the pcap with -T fields keeps cost bounded to ~1 minute per 1 GB of capture; the full analysis pass runs after all captures so it does not bleed into wall-clock measurement. Workflow change: Install pcap tools step now also installs tshark + jq, with wireshark-common pre-seeded so tshark installs non-interactively (we only read existing pcaps, no setuid dumpcap needed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The analysis pass aborted the whole job on the first .pcap because the wall-time grep returned no match, and `set -eo pipefail` propagated that exit-1 through `local x=$(grep | awk)` (the multi-line `local x; x=$(...)` form does NOT mask the exit code, unlike `local x=$(...)` on one line — bash gotcha). Two-part fix: 1. Drop into `set +e` / `set +o pipefail` for the analysis function body. The metrics are diagnostic — one tshark hiccup or an empty log line should not nuke a 25-minute capture run. Strict mode is restored at the end of the function so the rest of the script keeps its safety net. 2. Replace `grep -oE | awk` with awk-only. awk returns 0 even when no record matches, so empty-result log files no longer trip pipefail. Same parse, fewer pipes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…iagnosis The TCP-level analysis (zero-copy retx=123 vs baseline 4-18) gave strong evidence that utoo's receiver runtime is under back-pressure during install, but it doesn't tell us *why*. The leading hypothesis is disk IO saturation: rayon's parallel `fs::create + write_all` over 80k+ files in the ant-design tarball burst can outrun GitHub Actions runners' Azure-disk IOPS budget, blocking write threads → tokio threads back up → socket buffers fill. This commit adds an iostat-x sampler to each capture: capture_one() now spawns `iostat -x -y 1` in parallel with tcpdump, writing per-second device samples to $PCAP_DIR/<name>.iostat.txt. Both samplers are torn down with the workload command. analyze_pcap() parses the iostat log via column-position lookup (sysstat header row → column index map) and extracts: io_util_max_pct — peak disk-busy percentage io_util_avg_pct — average disk-busy percentage io_w_iops_max — peak write IOPS io_w_kbs_max — peak write throughput (kB/s) io_w_await_max_ms — peak write queue wait (ms) io_samples — sample count for sanity check These six fields land in summary.json alongside the TCP metrics, so artifact consumers can directly cross-correlate disk pressure with TCP back-pressure within the same capture window. The decision rule: * If io_util_max_pct stays high (>80%) on the experiment branch while baseline same-PM utoo-next stays low → install path is saturating disk and that's the mechanism. * If both branches show similar low %util, disk is not the bottleneck and we keep looking (e.g. CPU contention). Workflow: apt install adds `sysstat` (iostat lives there). It is preinstalled on ubuntu-latest images today, but pinning the dependency makes future image rebuilds resilient. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…y-gate

Smaller alternative to the partition refactor in #2911. Keep the original Vec<JoinHandle> + sequential drain loop structure (no restructuring) and only add a single per-iteration `tokio::task::consume_budget().await` at the top of the spawn loop body. Mechanism check from the partition pcap experiments (run 25553669984 + run 25552156559): with cooperative-yield hint at the per-iteration boundary, utoo install zwin events drop from 14 to 0, matching utoo-next baseline. Without it, the synchronous spawn loop runs ~2000 packages back-to-back on non-TTY CI (after TTY-gate's mutex removal) without giving the runtime a window to drain socket reads. Why dropping the partition: the bigger refactor showed a consistent +1s p3 mean regression across 4 attempts (utoo p3 = 7.42s avg, utoo-next = 6.37s, bun = 6.90s). The partition pushed all cheap paths (omit / cpu-incompat / file: link) before any spawns, then opened Phase 3 with all 64 in-flight downloads at once — a more concentrated disk burst than the original cheap-path/heavy-path interleaved schedule. The TCP-level fix worked, but disk-side back-pressure widened p3 σ. This commit keeps the original interleaved schedule (cheap paths inline with spawns) and adds only the runtime-yield hint at the top of each iteration. Same structural principle as #2911's Phase 3 yield, applied to the original loop without the surrounding restructure. Cost: ~5ns per iteration when the per-task tick budget isn't exhausted (the common case — JoinHandle.await later in the iteration resets budget). ~100ns when the budget exhausts after a run of cheap iterations and the runtime preempts to drain sockets. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces cooperative scheduling using tokio::task::consume_budget() in the package installation loop to prevent socket read starvation on non-TTY environments. It also gates progress bar updates behind TTY checks to avoid unnecessary mutex contention in CI or piped outputs. A review comment suggests updating the documentation for progress_inc to accurately reflect its generic parameter.

github-actions · 2026-05-09T05:56:01Z

📊 pm-bench-phases · `3258e65` · linux (`ubuntu-latest`)

Workflow run — ant-design

PMs: utoo (this branch) · utoo-npm (latest published) · bun (latest)

npmjs.org

p0_full_cold

PM	wall	±σ	user	sys	RSS	pgMinor
bun	8.80s	0.11s	9.98s	9.93s	745M	331.0K
utoo-next	8.12s	0.20s	10.23s	12.09s	953M	125.7K
utoo-npm	8.01s	0.03s	10.69s	12.23s	1.28G	173.3K
utoo	8.52s	0.61s	10.25s	12.17s	972M	121.9K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	14.9K	17.4K	1.17G	6M	1.84G	1.72G	1M
utoo-next	119.1K	82.9K	1.14G	5M	1.68G	1.68G	2M
utoo-npm	122.7K	85.2K	1.14G	5M	1.68G	1.68G	2M
utoo	133.4K	83.8K	1.14G	5M	1.68G	1.68G	2M

p1_resolve

PM	wall	±σ	user	sys	RSS	pgMinor
bun	1.85s	0.03s	3.91s	1.05s	502M	165.5K
utoo-next	3.10s	0.11s	5.31s	1.92s	610M	85.7K
utoo-npm	3.05s	0.02s	5.30s	1.84s	609M	79.2K
utoo	3.01s	0.08s	5.30s	1.92s	613M	83.5K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	7.7K	4.7K	201M	3M	105M	-	1M
utoo-next	68.1K	115.2K	199M	2M	7M	3M	2M
utoo-npm	68.3K	112.6K	199M	2M	7M	3M	2M
utoo	67.5K	112.2K	199M	2M	7M	3M	2M

p3_cold_install

PM	wall	±σ	user	sys	RSS	pgMinor
bun	6.71s	0.06s	6.11s	9.75s	645M	212.8K
utoo-next	6.58s	1.09s	4.86s	10.65s	456M	56.3K
utoo-npm	6.05s	0.08s	5.27s	10.96s	905M	115.2K
utoo	6.35s	1.56s	4.84s	10.45s	478M	59.3K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	4.1K	6.9K	995M	4M	1.74G	1.74G	1M
utoo-next	101.8K	49.6K	964M	3M	1.67G	1.67G	2M
utoo-npm	103.6K	62.9K	964M	2M	1.67G	1.67G	2M
utoo	91.3K	50.9K	964M	2M	1.67G	1.67G	2M

p4_warm_link

PM	wall	±σ	user	sys	RSS	pgMinor
bun	3.32s	0.04s	0.22s	2.30s	137M	32.7K
utoo-next	2.20s	0.24s	0.49s	3.73s	80M	18.6K
utoo-npm	2.19s	0.06s	0.50s	3.78s	84M	19.2K
utoo	2.07s	0.06s	0.47s	3.76s	79M	18.3K

PM	vCtx	iCtx	netRX	netTX	cache	node_mod	lock
bun	262	27	5M	48K	1.88G	1.72G	1M
utoo-next	41.0K	18.9K	308K	7K	1.68G	1.68G	2M
utoo-npm	46.0K	21.1K	323K	29K	1.68G	1.68G	2M
utoo	40.8K	18.9K	309K	11K	1.68G	1.68G	2M

npmmirror.com: no output captured.

elrrrrrrr and others added 9 commits May 7, 2026 11:20

Merge branch 'next' into experiment/install-tty-gate

d3f049a

Merge remote-tracking branch 'origin/next' into experiment/install-tt…

72a7266

…y-gate

Merge branch 'next' into experiment/install-tty-gate

b0d015e

elrrrrrrr added benchmark Run pm-bench on PR A-Pkg Manager Area: Package Manager labels May 8, 2026

gemini-code-assist Bot reviewed May 8, 2026

View reviewed changes

Comment thread crates/pm/src/util/logger.rs

Merge branch 'next' into experiment/install-tty-gate-coop-yield

45888e5

elrrrrrrr marked this pull request as ready for review May 8, 2026 12:52

elrrrrrrr requested review from fireairforce, killagu, xusd320 and yuzheng14 May 8, 2026 12:52

elrrrrrrr added benchmark Run pm-bench on PR and removed benchmark Run pm-bench on PR labels May 9, 2026

elrrrrrrr marked this pull request as draft May 9, 2026 07:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(pm): TTY-gate progress + cooperative yield in spawn loop#2915

perf(pm): TTY-gate progress + cooperative yield in spawn loop#2915
elrrrrrrr wants to merge 10 commits intonextfrom
experiment/install-tty-gate-coop-yield

elrrrrrrr commented May 8, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

elrrrrrrr commented May 8, 2026

What

Why this not the bigger #2911 partition

Pcap mechanism evidence (carried over from partition experiments)

Companion / supersedes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

github-actions Bot commented May 9, 2026

📊 pm-bench-phases · 3258e65 · linux (ubuntu-latest)

npmjs.org

p0_full_cold

p1_resolve

p3_cold_install

p4_warm_link

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

📊 pm-bench-phases · `3258e65` · linux (`ubuntu-latest`)