Skip to content

fix(policy): release pr.mu during executeEvictWhere peer evaluation (PILOT-104)#3

Merged
TeoSlayer merged 1 commit into
mainfrom
openclaw/pilot-104-20260528-062126
May 28, 2026
Merged

fix(policy): release pr.mu during executeEvictWhere peer evaluation (PILOT-104)#3
TeoSlayer merged 1 commit into
mainfrom
openclaw/pilot-104-20260528-062126

Conversation

@matthew-pilot
Copy link
Copy Markdown
Collaborator

What failed

executeEvictWhere held pr.mu (exclusive write lock) while iterating over every peer and calling EvaluatePeerExpr for each one. Each peer expression evaluation has a 100 ms timeout (policylang/engine.go:246). With 1 000 peers worst-case the lock was held for ~100 seconds, blocking reconcileMembership, applyMembershipDiff, and every other mu-requiring path in the daemon-side policy engine.

Why this fix

Snapshot peer pointers under pr.mu.RLock(), release, evaluate policies outside the lock, then re-acquire pr.mu.Lock() briefly to apply evictions. Concurrent peer removals between the snapshot and re-acquire are harmless — delete on a non-existent map key is a no-op.

The write-critical section is now O(1) regardless of peer count.

Verification

  • go build ./... — clean
  • go vet ./... — clean
  • go test ./... — all green (policy + policylang, 5.5s suite)
  • TestPin_LargePeerList_EvictWhereBoundedLatency (5k peers) passes

Changes

  • runner.go: executeEvictWhere — +19/-7 lines (snapshot + split lock pattern)

Closes PILOT-104

…PILOT-104)

executeEvictWhere previously held pr.mu (exclusive write lock) while
iterating over every peer and calling EvaluatePeerExpr for each one.
Each peer expression evaluation has a 100 ms timeout.  With 1 000
peers worst-case the lock was held for ~100 seconds, blocking
reconcileMembership, applyMembershipDiff, and every other mu-requiring
path in the daemon-side policy engine.

Fix: snapshot peer pointers under pr.mu.RLock(), release, evaluate
policies outside the lock, then re-acquire pr.mu.Lock() briefly to
apply evictions.  Concurrent peer removals between the snapshot and
re-acquire are harmless — delete on a non-existent map key is a no-op.

The write-critical section is now O(1) regardless of peer count.

Closes PILOT-104
@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@matthew-pilot
Copy link
Copy Markdown
Collaborator Author

Status:

  • PR state: open, mergeable (no conflicts)
  • CI: ✅ all green (test + codecov/patch)
  • Linked Jira: PILOT-104QA/IN-REVIEW
  • Canary: not applicable (policy not in canary harness input set)
  • Reviews: none yet
  • Last updated: 2026-05-28 06:28 UTC

@TeoSlayer TeoSlayer merged commit 81999fc into main May 28, 2026
2 checks passed
@TeoSlayer TeoSlayer deleted the openclaw/pilot-104-20260528-062126 branch May 28, 2026 16:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants