Skip to content

Push policy coverage 90.3% -> 99.5%; pin audit-flagged defensives#1

Merged
TeoSlayer merged 1 commit into
mainfrom
add-test-coverage
May 28, 2026
Merged

Push policy coverage 90.3% -> 99.5%; pin audit-flagged defensives#1
TeoSlayer merged 1 commit into
mainfrom
add-test-coverage

Conversation

@TeoSlayer
Copy link
Copy Markdown
Contributor

Summary

  • Coverage: 90.3% → 99.5% combined (99.6% policy / 99.1% policylang).
  • Audit pins: three iter-2 audit findings (default-allow on empty/unrecognized verdict, no expression-eval timeout, peer-list iteration without timeout) now have sticky regression tests.
  • 6 new test files covering the worst-covered functions surfaced by `go tool cover`. One small production change in `runner.go` (PILOT_HOME env hook) is the production-side support the existing TestMain already relies on for parallel-safe persistence — without it 4 tests race through `~/.pilot/policy_*.json`.

Coverage moves (highlights)

Function Before After
runner.evaluatePerPeerCycle 0.0% 100%
runner.EvaluateActions 53.3% 100%
runner.cycleLoop 73.9% 95.7%
runner.reconcileMembership 75.0% 100%
runner.applyMembershipDiff 85.1% 97.0%
runner.bootstrap 86.3% 92.2%
runner.fetchMembersWithTags 82.6% 97.8%
service.handleNetworkJoined 73.3% 100%
service.LoadPersisted 86.7% 93.3%
policylang.evaluateGate 86.4% 100%
policylang.evaluateActions 81.8% 100%

Audit defensives (zz_audit_defensive_test.go)

  1. Default verdict — `TestPin_DefaultVerdictDeny_NoMatchingRule`, `TestPin_DefaultVerdictAllow_NoMatchingRule`, `TestPin_DefaultVerdictUnrecognized_RejectedByValidate`, `TestPin_EvaluateGate_FailOpenOnEvalError`.
  2. Expression timeout — `TestPin_ExprTimeout_GateBoundedLatency` (1s SLA proxy for the 100ms runProgram select), `TestPin_ExprTimeout_PanicSurfacesAsError`.
  3. Peer-list iteration budget — `TestPin_LargePeerList_EvictWhereBoundedLatency` (5k peers <3s), `TestPin_LargePeerList_ApplyMembershipDiffBoundedLatency`, `TestPin_PeerLoopConcurrentReaders` (concurrent Status() never starved).

Honest ceiling (~10 lines, 4 blocks remain uncovered)

  • `policylang/engine.go:233` defer-recover branch — needs a deterministic `expr.Run` panic, not exposed via the public API.
  • `policylang/engine.go:250` 100ms timeout branch — needs a synthetic infinite `expr` program.
  • `runner.go:515` `toRemove<=0` after clamp — defensive dead code; the earlier `total<=min` early-return catches every input that could reach it.
  • `runner.go:1182` `json.MarshalIndent` error — unreachable for `policySnapshot`'s field types.

Test plan

  • `go test -race -count=1 -timeout 180s ./...` passes in ~7s.
  • Combined coverage: 99.5%.
  • No production behaviour changes other than the PILOT_HOME env hook in `NewPolicyRunner` (the existing TestMain already depends on it).

Adds 6 new *_test.go files covering the worst-covered functions surfaced
by go tool cover, plus three defensive pins for the iter-2 audit findings:

  AUDIT PIN #1 (MED): default-allow on empty/unrecognized verdict.
    - DefaultVerdict="deny" + no matching rule MUST deny (zz_audit_defensive_test.go)
    - DefaultVerdict="" + no matching rule MUST allow (backcompat)
    - Bogus DefaultVerdict values MUST be rejected by Validate
    - EvaluateGate fail-open on expression eval error (current contract)

  AUDIT PIN #2 (MED): expression evaluation timeout.
    - EvaluateGate is bounded inside a 1s SLA (proxy for runProgram's 100ms
      select; trips immediately if the goroutine+select disappears)
    - OOB-index expression returns fail-open allow without panicking
      (drives the defer-recover branch end-to-end)

  AUDIT PIN #3 (MED): peer-list iteration is not unbounded.
    - executeEvictWhere over 5k peers completes <3s
    - applyMembershipDiff over 5k peers completes <3s
    - Concurrent Status() readers aren't starved during reconcile pass
      (catches a regression from RLock -> Lock-for-whole-pass refactors)

Coverage holes filled (highlights):
    runner.evaluatePerPeerCycle   0.0% -> 100%
    runner.EvaluateActions       53.3% -> 100% (Evict/EvictWhere/Fill/PruneTrust/FillTrust dispatch + eval-error)
    runner.executeFill           85.3% -> 100% (max_peers clamps, over-capacity no-op)
    runner.executePruneTrust     89.7% ->  96.6% (toRemove promote/clamp/early-return)
    runner.executeFillTrust      85.7% ->  91.4% (already-trusted skip, deficit clamp, handshake error)
    runner.cycleLoop             73.9% ->  95.7% (bad-duration default 24h, sub-1s promote 1s, reconcile + cycle ticks)
    runner.reconcileMembership   75.0% -> 100%
    runner.applyMembershipDiff   85.1% ->  97.0% (join evict/log/webhook, leave dispatches, eval-error continue, cooldown set)
    runner.bootstrap             86.3% ->  92.2% (max_peers clamp, deny cooldown, log/webhook/tag dispatch, eval-error)
    runner.rankTrustLinks        75.0% -> 100% (random branch)
    runner.rankedPeers           77.8% -> 100% (activity branch)
    runner.fetchMembersWithTags  82.6% ->  97.8% (backoff skip, recovery reset, failure increment, non-map entry, missing nodes, 5min cap)
    runner.load                  85.7% ->  93.3% (unmarshal error, nil-peers init)
    runner.NewPolicyRunner       87.5% (PILOT_HOME override + fallback + prior-state load)
    runner.paramInt              87.5% -> 100% (int64 case)
    runner.Stop                  (idempotency)
    service.handleNetworkJoined  73.3% -> 100% (missing netID, already-running, bad JSON)
    service.handleNetworkLeft    75.0% -> 100%
    service.dispatchNetworkEvents  (tags_changed reserved branch)
    service.startInternal        94.1% -> 100% (Compile error)
    service.LoadPersisted        86.7% ->  93.3% (empty home, readdir error, UserHomeDir error)
    service.exprPolicyJSONFromPayload  88.2% -> 100% (channel + nested-channel marshal failure)
    policylang.evaluateGate      86.4% -> 100% (rule.On mismatch, eval error, side-effects accumulate)
    policylang.evaluateActions   81.8% -> 100% (rule.On mismatch, eval error)
    policylang.runProgram        81.2% ->  87.5% (non-bool result, happy path)
    policylang.Validate          97.0% -> 100% (action-validate error propagation)

Final coverage: 99.5% combined (99.6% policy / 99.1% policylang).

runner.go change is the production-side hook the existing TestMain
already relies on: PILOT_HOME env wins over UserHomeDir so parallel
tests get per-binary tmpdirs and don't race through ~/.pilot/policy_*.json.

Remaining ceiling (4 blocks, ~10 lines, all honest):
  policylang/engine.go:233  defer-recover  needs deterministic expr.Run panic
  policylang/engine.go:250  100ms timeout  needs synthetic infinite expr
  runner.go:515             toRemove<=0    dead code (earlier total<=min catches it)
  runner.go:1182            MarshalIndent  unreachable for policySnapshot's types

go test -race -count=1 -timeout 180s ./... passes in 7s.
@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Welcome to Codecov 🎉

Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests.

ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment

Thanks for integrating Codecov - We've got you covered ☂️

@TeoSlayer TeoSlayer merged commit d2d4644 into main May 28, 2026
2 checks passed
@TeoSlayer TeoSlayer deleted the add-test-coverage branch May 28, 2026 01:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants