Skip to content

fix(alerts): page every intermediate escalation level (audit_28 §5 medium)#102

Merged
brownjuly2003-code merged 1 commit into
mainfrom
fix/escalation-intermediate-levels
Jun 29, 2026
Merged

fix(alerts): page every intermediate escalation level (audit_28 §5 medium)#102
brownjuly2003-code merged 1 commit into
mainfrom
fix/escalation-intermediate-levels

Conversation

@brownjuly2003-code

Copy link
Copy Markdown
Owner

Summary

Closes the one clean, real-impact correctness defect from audit_28_06_26.md §5
(medium): alert escalation silently skips intermediate levels.

next_escalation_step returned the highest due escalation step (due_steps[-1]).
When two or more levels became due between evaluation ticks — sparse polling, or
catch-up after a restart — the on-call recipients of the intervening levels were
never paged; only the top level's webhook fired. For tiered escalation where each
level targets a different responder (on-call → lead → manager), the intermediate
responders were silently dropped.

Fix: advance exactly one level per evaluation tick — the lowest due level
above the current one (min(due_steps, key=level), robust to an unsorted
escalation list). Each tier is paged on successive ticks. No public contract
change.

Tests

  • New next_escalation_step regression tests: 3-level all-due alert advances
    L1→L2→L3 across ticks (never jumps straight to L3 and skips L2); lowest-due
    picked regardless of list order.
  • Counterfactual confirmed the test discriminates the bug: old due_steps[-1]
    returns L3 (skip), new returns L2 (correct).

Verification (no-Docker)

  • ruff / ruff format / mypy strict: clean
  • 75 alert/dispatcher unit tests pass
  • Existing 2-level escalation tests (≤1 level due per tick) unaffected

Scope note: the audit's HIGH backlog (16 HIGH + M4) is already merged
(#96#99). This is the first §5-medium item; the rest are deliberate
trade-offs, entangled with recently-merged work, Docker-gated, or latent with
no practical impact — left out intentionally.

🤖 Generated with Claude Code

next_escalation_step returned the highest *due* step (due_steps[-1]).
When two or more escalation levels became due between evaluation ticks
(sparse polling, restart catch-up), the on-call recipients of the
intervening levels were silently skipped — only the top level's webhook
fired. Advance exactly one level per tick (the lowest due level above the
current one, robust to unsorted escalation lists) so each tier is paged.

Adds next_escalation_step regression tests (3-level all-due -> level 2
then 3, never jumping straight to 3; lowest-due-regardless-of-order).
No public contract change; no-Docker verified (ruff/format/mypy clean,
75 alert/dispatcher unit tests pass).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown

DORA Metrics

  • Window: last 30 days
  • Branch: main
  • Deployment frequency: 164 total / 38.27 per week
  • Lead time for changes: avg 0.27h / median 0.0h
  • Change failure rate: 58.54% (96/164)
  • MTTR: 0.23h across 4 incident(s)

@brownjuly2003-code brownjuly2003-code merged commit 638b475 into main Jun 29, 2026
22 of 23 checks passed
@brownjuly2003-code brownjuly2003-code deleted the fix/escalation-intermediate-levels branch June 29, 2026 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants