Skip to content

v0.6.4: build concurrency cap (host-hang fix) + skill tuning#535

Merged
coseto6125 merged 2 commits into
mainfrom
feat/build-semaphore
Jun 3, 2026
Merged

v0.6.4: build concurrency cap (host-hang fix) + skill tuning#535
coseto6125 merged 2 commits into
mainfrom
feat/build-semaphore

Conversation

@coseto6125
Copy link
Copy Markdown
Owner

Ships v0.6.4 with the crash-prevention feature you asked to land in this release, plus the A/B-tuned ecp skill.

Features — build concurrency cap (the host-hang fix)

Per-repo .build.lock serializes rebuilds of one repo, but nothing bounded how many different repos rebuild at once. Each L2 build issues ~180 MB I/O; on WSL2 the ~/.ecp vhdx saturates under a handful of concurrent builds and hangs the whole host (observed: 8 concurrent agents → load 8.4 → freeze).

A global build-slot semaphore (build/semaphore.rs) is acquired at the top of build_inside_locked — the one choke point both build_l2 and force_rebuild_l2 flow through. Same fs2 advisory locks the orchestrator already uses (no new deps, cross-platform).

Cap K is env-aware (ECP_MAX_CONCURRENT_BUILDS overrides). Honest note: only the WSL2 value is backed by a real crash measurement; native Windows/macOS/Linux ceilings are conservative inferences.

Env K basis
WSL2 clamp(cores/8, 2, 3) measured-safe band
Windows clamp(cores/4, 2, 4) NTFS+Defender (inferred)
Unix clamp(cores/4, 2, 6) SSD, I/O not limiting (inferred)

Gates only rebuilds — cache hits, warm-attach, and every query are untouched, so steady-state usage never queues; only first-build / SHA-change can wait, and only when K builds already run. Degrades open if the slot machinery errors or a slot is held >120s.

Also in this release

Tests

4 unit (cap arithmetic) + 2 integration (real index acquires a slot; ECP_MAX_CONCURRENT_BUILDS=1 still builds). Existing build_orchestrator suite green. fmt + clippy --all-targets clean. Supersedes #533 (which lacked the semaphore).

Per-repo .build.lock already serializes rebuilds OF ONE repo, but nothing
bounded how many DIFFERENT repos rebuild at once. Each L2 build issues ~180 MB
of I/O; on WSL2 the ~/.ecp vhdx saturates under a handful of concurrent builds
and hangs the whole host (observed: 8 concurrent agents → load 8.4 → freeze).

Add a global build slot semaphore (build/semaphore.rs) acquired at the top of
build_inside_locked — the single choke point both build_l2 and force_rebuild_l2
flow through. It uses the same fs2 advisory file locks the orchestrator already
relies on (~/.ecp/.build-slots/slot-N), so no new deps and it's cross-platform.

Cap K is environment-aware (only the WSL2 value is backed by a real crash
measurement; native Windows/macOS/Linux ceilings are conservative inferences),
overridable via ECP_MAX_CONCURRENT_BUILDS:
  - WSL2:    clamp(cores/8, 2, 3)   # vhdx, measured-safe band
  - Windows: clamp(cores/4, 2, 4)   # NTFS + Defender amplification (inferred)
  - Unix:    clamp(cores/4, 2, 6)   # SSD, I/O not limiting (inferred)

Gates ONLY the heavy rebuild path: cache hits, warm-attach, and every query are
untouched, so steady-state usage never queues — only first-build / SHA-change
can wait, and only when K builds are already running. Degrades open (proceeds
unthrottled) if the slot machinery itself errors or a slot is held past 120s.

Tests: 4 unit (cap arithmetic across core counts + env classes) + 2 integration
(real index acquires a slot; ECP_MAX_CONCURRENT_BUILDS=1 still builds). Existing
build_orchestrator suite green (no regression).
@coseto6125 coseto6125 enabled auto-merge (squash) June 3, 2026 08:37
@coseto6125 coseto6125 added the merge-queue Opt-in to Mergify merge queue label Jun 3, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 3, 2026

ecp impact cache (0 symbols) — internal, used by ecp dev pr-analyze

[]

@github-actions github-actions Bot added the ecp:risk-low ecp signal label Jun 3, 2026
@coseto6125 coseto6125 merged commit 8b21ec7 into main Jun 3, 2026
18 checks passed
@coseto6125 coseto6125 deleted the feat/build-semaphore branch June 3, 2026 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ecp:risk-low ecp signal merge-queue Opt-in to Mergify merge queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant