v0.6.4: build concurrency cap (host-hang fix) + skill tuning#535
Merged
Conversation
Per-repo .build.lock already serializes rebuilds OF ONE repo, but nothing bounded how many DIFFERENT repos rebuild at once. Each L2 build issues ~180 MB of I/O; on WSL2 the ~/.ecp vhdx saturates under a handful of concurrent builds and hangs the whole host (observed: 8 concurrent agents → load 8.4 → freeze). Add a global build slot semaphore (build/semaphore.rs) acquired at the top of build_inside_locked — the single choke point both build_l2 and force_rebuild_l2 flow through. It uses the same fs2 advisory file locks the orchestrator already relies on (~/.ecp/.build-slots/slot-N), so no new deps and it's cross-platform. Cap K is environment-aware (only the WSL2 value is backed by a real crash measurement; native Windows/macOS/Linux ceilings are conservative inferences), overridable via ECP_MAX_CONCURRENT_BUILDS: - WSL2: clamp(cores/8, 2, 3) # vhdx, measured-safe band - Windows: clamp(cores/4, 2, 4) # NTFS + Defender amplification (inferred) - Unix: clamp(cores/4, 2, 6) # SSD, I/O not limiting (inferred) Gates ONLY the heavy rebuild path: cache hits, warm-attach, and every query are untouched, so steady-state usage never queues — only first-build / SHA-change can wait, and only when K builds are already running. Degrades open (proceeds unthrottled) if the slot machinery itself errors or a slot is held past 120s. Tests: 4 unit (cap arithmetic across core counts + env classes) + 2 integration (real index acquires a slot; ECP_MAX_CONCURRENT_BUILDS=1 still builds). Existing build_orchestrator suite green (no regression).
Contributor
ecp impact cache (0 symbols) — internal, used by
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Ships v0.6.4 with the crash-prevention feature you asked to land in this release, plus the A/B-tuned ecp skill.
Features — build concurrency cap (the host-hang fix)
Per-repo
.build.lockserializes rebuilds of one repo, but nothing bounded how many different repos rebuild at once. Each L2 build issues ~180 MB I/O; on WSL2 the~/.ecpvhdx saturates under a handful of concurrent builds and hangs the whole host (observed: 8 concurrent agents → load 8.4 → freeze).A global build-slot semaphore (
build/semaphore.rs) is acquired at the top ofbuild_inside_locked— the one choke point bothbuild_l2andforce_rebuild_l2flow through. Samefs2advisory locks the orchestrator already uses (no new deps, cross-platform).Cap K is env-aware (
ECP_MAX_CONCURRENT_BUILDSoverrides). Honest note: only the WSL2 value is backed by a real crash measurement; native Windows/macOS/Linux ceilings are conservative inferences.Gates only rebuilds — cache hits, warm-attach, and every query are untouched, so steady-state usage never queues; only first-build / SHA-change can wait, and only when K builds already run. Degrades open if the slot machinery errors or a slot is held >120s.
Also in this release
Tests
4 unit (cap arithmetic) + 2 integration (real index acquires a slot;
ECP_MAX_CONCURRENT_BUILDS=1still builds). Existing build_orchestrator suite green. fmt + clippy --all-targets clean. Supersedes #533 (which lacked the semaphore).