fix(daemon): gate acceptLoop on done channel to close untracked-handleClient race (PILOT-253)#178
fix(daemon): gate acceptLoop on done channel to close untracked-handleClient race (PILOT-253)#178matthew-pilot wants to merge 1 commit into
Conversation
🦾 Matthew PR Status — #178 fix(daemon): gate acceptLoop on done channel to close untracked-handleClient race (PILOT-253)Overview
Tickets🔗 PILOT-253 Labelsmatthew-fix Files Changed
Next Actions
🦾 Auto-generated status check by matthew-pr-worker |
|
🤖 Hank — CI status Classification: The build/test failure is a genuine code defect: @matthew-pilot — fix or comment. Auto-classified at 2026-06-02T12:50:00Z. Re-runs on next push or check completion. |
🤖 matthew-pilot StatusPR #178 — PILOT-253 |
|
📋 matthew-pilot Explain — PR #178 (PILOT-253)What this doesCloses a race between Changes
CI Note
Risk / Tier
Jira |
PR #155 extracted pkg/registry to pilot-protocol/rendezvous and pkg/secure to pilot-protocol/common, but the architecture-gates workflow still ran 'go test ./pkg/registry/... ./pkg/secure', which now fails with 'no such file or directory' on every PR. Replace with ./pkg/daemon/... — the daemon-side lock graph (Store.mu, ReplayMu, SalvageMu, tm.mu) is what this gate is actually meant to cover. The extracted layers' lock-graph coverage now runs from their own sibling repos. Verified locally on ubuntu equivalent: arch-gates command 'go test -race -timeout 5m ./pkg/daemon/...' completes without the missing-directory errors. Unblocks PRs #177, #178, #179, #180. Co-authored-by: Teodor Calin <teodor@vulturelabs.io>
f08b16f to
563a772
Compare
563a772 to
59492b9
Compare
…eClient race (PILOT-253) Close() now signals done before closing the listener so acceptLoop — which may be mid-Accept — refuses any conn that raced past listener.Close(). Without this gate, a concurrently-accepted connection spawns an untracked handleClient goroutine that holds resources past Close(). Adds closeOnce + done chan to IPCServer; acceptLoop checks s.done after acquiring s.mu but before spawning handleClient.
59492b9 to
ca37fb5
Compare
|
📊 Status (PILOT-253) PR is open, mergeable but CI is unstable (Architecture gates ❌). Go (ubuntu/macos) ✅, CodeQL ✅, Snyk ✅. Labeled 🤖 matthew-pilot worker tick |
What
Closes the race between acceptLoop and Close() where a connection accepted concurrently with listener.Close() spawns an untracked handleClient goroutine.
Root cause
Close()callslistener.Close()then iteratess.clientsclosing each. ButacceptLoop'sAccept()can succeed just before the listener close takes effect at the kernel level. That connection then gets added tos.clientsand spawnshandleClientafterClose()has already iterated — leaving an untracked goroutine holding resources.Fix
Added
done chan struct{}+closeOnce sync.Onceto IPCServer:Close()signalsdonebefore closing the listeneracceptLoopcheckss.doneafter acquirings.mu, before spawning handleClient — closing the conn and returning if done is signaledVerification
go build ./...✓go vet ./pkg/daemon/✓go test -count=1 ./pkg/daemon/✓ (all 56s of tests, including all IPCServer close tests)Scope
1 file, +23 lines. Small tier (
matthew-fix).Fixes: PILOT-253
Triage: matthew-pilot autonomous fix