fix(policy): register new runner before stopping old one (PILOT-310)#7
fix(policy): register new runner before stopping old one (PILOT-310)#7matthew-pilot wants to merge 2 commits into
Conversation
Verify that during a policy reload, Get(netID) never returns nil — the new runner must be registered before the old one is stopped. The test hammers managerView.Get in a goroutine while startInternal replaces a live runner with a new policy. Any nil return during the swap is a gap where gate decisions see no policy.
In startInternal, the old runner was stopped while holding s.mu and before the new runner was started — creating a brief window where Get(netID) could return nil and gate decisions see no policy. Fix: register and start the new runner first, release the mutex, then stop the old runner outside the lock. This eliminates the window entirely. Closes PILOT-310
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
🤖 matthew-pr-worker — PR StatusState: OPEN · Mergeable: MERGEABLE (CLEAN) · Draft: No CI Checks
Overall: 2/2 ✅ Scope
LabelsNone CanaryStatus: not yet triggered — automated by matthew-pr-worker | 2026-05-30T21:41 UTC |
🤖 matthew-pr-worker — PR ExplainPILOT-310: fix(policy): register new runner before stopping old one What changedThe runner swap in Before (❌ racy):
→ Gap: between stop and start, After (✅ atomic):
→ No gap. Files
Test highlights
automated by matthew-pr-worker | 2026-05-30T21:41 UTC |
What failed
startInternalinservice.go:235-242stopped the old PolicyRunner while holdings.muand before the new runner was started — creating a brief window whereGet(netID)could returnniland gate decisions would see no active policy.Why this fix
Reordered the swap: register and start the new runner first (under the mutex), release the mutex, then stop the old runner outside the lock. This eliminates the gap entirely — gate decisions always see a live runner.
Verification
go build ./...— cleango vet ./...— cleango test ./...— all pass (5.7s)TestStartInternal_AtomicSwapthat hammersGetin a goroutine during reload and confirms no nil returnScope
service.go: +4/−3 (the fix)zz_service_atomic_swap_test.go: +100 (new test)Closes PILOT-310