Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions .changeset/quieter-sustained-aggregation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
'@iqai/alert-logger': minor
---

Make sustained alerting quieter and more informative by:

- adding rate-aware early handoff from ramp to sustained mode
- changing the default sustained update interval from 5 minutes to 15 minutes
- adding `aggregation.periodCount` for per-update deltas while keeping `suppressedSince` for compatibility
- exposing `aggregation.rampExitRatePerSecond` and `aggregation.rampExitRateWindowMs` configuration knobs
- updating sustained formatter output to show both per-period and total counts
22 changes: 15 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Stop drowning in alert storms. `@iqai/alert-logger` groups repeated errors using
## ✨ Features

- **Unified API** — `logger.error('msg', error, { fields })` routes to every configured adapter
- **Exponential suppression** — alerts fire at 1, 2, 4, 8, 16, 32, 64... then switch to periodic digests
- **Rate-aware suppression** — alerts ramp quickly, then switch to quieter periodic updates when an incident is clearly ongoing
- **Resolution detection** — get a "resolved" message when an error stops occurring
- **Error fingerprinting** — same bug from different requests groups automatically (strips IDs, timestamps, UUIDs)
- **Multi-channel routing** — route by severity level or custom tags to different channels
Expand Down Expand Up @@ -152,12 +152,14 @@ When the same error fires repeatedly, the library doesn't spam your channel:
| Phase | Trigger | What gets sent |
|-------|---------|----------------|
| **Onset** | 1st occurrence | Full alert with stack trace, fields, tags |
| **Ramp** | 2nd, 4th, 8th, 16th, 32nd, 64th | Compact: `"Payment failed (x8 — 4 suppressed)"` |
| **Sustained** | >64 in window | Digest every 5min: `"x4,812 in last 5m"` |
| **Ramp** | 2nd, 4th, 8th, 16th, 32nd, 64th until rate/count handoff | Compact: `"Payment failed (x8 — 4 suppressed)"` |
| **Sustained** | >64 total, or current rate crosses threshold after at least one ramp alert | Digest every 15min: `"x37 since last update · x412 total"` |
| **Resolution** | 0 hits for 2min | `"Resolved: Payment failed — 12,847 total over 23m"` |

Errors are grouped by **fingerprint** — the library strips variable parts (IDs, timestamps, UUIDs, hex addresses) from the error message and hashes it with the top stack frames. Same bug, different request = same group.

By default, the rate check uses a 1-minute sliding window and exits ramp early at `0.5` events/sec after the first ramp checkpoint has been sent.

## 🌍 Per-Environment Config

Same codebase, different behavior per environment. Dev won't bug you as much as prod:
Expand All @@ -169,15 +171,19 @@ AlertLogger.init({
environments: {
production: {
levels: ['warning', 'critical'],
aggregation: { digestIntervalMs: 5 * 60_000 },
aggregation: { digestIntervalMs: 15 * 60_000 },
},
staging: {
levels: ['critical'], // only errors, no warnings
aggregation: { digestIntervalMs: 15 * 60_000 },
},
development: {
levels: ['critical'],
aggregation: { rampThreshold: 8, digestIntervalMs: 30 * 60_000 },
aggregation: {
rampThreshold: 8,
rampExitRatePerSecond: 0.25,
digestIntervalMs: 30 * 60_000,
},
},
},
})
Expand Down Expand Up @@ -279,8 +285,10 @@ AlertLogger.init({

// Aggregation tuning
aggregation: {
rampThreshold: 64, // switch from ramp to digest phase
digestIntervalMs: 5 * 60_000, // how often to send digests
rampThreshold: 64, // count-based handoff into sustained mode
rampExitRatePerSecond: 0.5, // early sustained handoff after a ramp alert
rampExitRateWindowMs: 60_000, // sliding window used for current-rate calculation
digestIntervalMs: 15 * 60_000, // how often to send sustained updates
resolutionCooldownMs: 2 * 60_000, // silence before "resolved"
},

Expand Down
5 changes: 4 additions & 1 deletion src/adapters/console/console-adapter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,9 @@ export class ConsoleAdapter implements AlertAdapter {
lines.push(` fields: ${pairs}`)
}

lines.push(` count: ${aggregation.count} | phase: ${aggregation.phase}`)
lines.push(
` count: ${aggregation.count} | periodCount: ${aggregation.periodCount} | phase: ${aggregation.phase}`,
)

return lines.join('\n')
}
Expand All @@ -72,6 +74,7 @@ export class ConsoleAdapter implements AlertAdapter {
aggregation: {
phase: alert.aggregation.phase,
count: alert.aggregation.count,
periodCount: alert.aggregation.periodCount,
},
})
}
Expand Down
1 change: 1 addition & 0 deletions src/adapters/discord/discord-adapter.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ function makeAlert(overrides: Partial<FormattedAlert> = {}): FormattedAlert {
phase: 'onset',
fingerprint: 'abc123',
count: 1,
periodCount: 0,
suppressedSince: 0,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand Down
8 changes: 7 additions & 1 deletion src/adapters/discord/formatter.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ function makeAlert(overrides: Partial<FormattedAlert> = {}): FormattedAlert {
phase: 'onset',
fingerprint: 'abc123',
count: 1,
periodCount: 0,
suppressedSince: 0,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand Down Expand Up @@ -86,6 +87,7 @@ describe('formatDiscordEmbed', () => {
phase: 'ramp',
fingerprint: 'abc123',
count: 10,
periodCount: 5,
suppressedSince: 5,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand All @@ -106,6 +108,7 @@ describe('formatDiscordEmbed', () => {
phase: 'sustained',
fingerprint: 'abc123',
count: 200,
periodCount: 37,
suppressedSince: 0,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand All @@ -114,7 +117,8 @@ describe('formatDiscordEmbed', () => {
})
const embed = formatDiscordEmbed(alert)

expect(embed.title).toContain('x200')
expect(embed.title).toContain('x37 since last update')
expect(embed.title).toContain('x200 total')
expect(embed.title).toContain('peak rate: 3.7/s')
})
})
Expand All @@ -127,6 +131,7 @@ describe('formatDiscordEmbed', () => {
phase: 'resolution',
fingerprint: 'abc123',
count: 50,
periodCount: 0,
suppressedSince: 0,
firstSeen: now - 3_600_000,
lastSeen: now,
Expand All @@ -146,6 +151,7 @@ describe('formatDiscordEmbed', () => {
phase: 'resolution',
fingerprint: 'abc123',
count: 1,
periodCount: 0,
suppressedSince: 0,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand Down
2 changes: 1 addition & 1 deletion src/adapters/discord/formatter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ export function formatDiscordEmbed(alert: FormattedAlert): DiscordEmbed {

case 'sustained': {
const title = truncate(
`${badge} [${alert.level.toUpperCase()}] ${safeTitle} (x${aggregation.count} in last digest period \u00B7 peak rate: ${aggregation.peakRate.toFixed(1)}/s)`,
`${badge} [${alert.level.toUpperCase()}] ${safeTitle} (x${aggregation.periodCount} since last update \u00B7 x${aggregation.count} total \u00B7 peak rate: ${aggregation.peakRate.toFixed(1)}/s)`,
256,
)

Expand Down
10 changes: 9 additions & 1 deletion src/adapters/slack/formatter.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ function makeAlert(overrides: Partial<FormattedAlert> = {}): FormattedAlert {
phase: 'onset',
fingerprint: 'abc123',
count: 1,
periodCount: 0,
suppressedSince: 0,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand Down Expand Up @@ -97,6 +98,7 @@ describe('formatSlackPayload', () => {
phase: 'ramp',
fingerprint: 'abc123',
count: 10,
periodCount: 5,
suppressedSince: 5,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand All @@ -116,6 +118,7 @@ describe('formatSlackPayload', () => {
phase: 'resolution',
fingerprint: 'abc123',
count: 1,
periodCount: 0,
suppressedSince: 0,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand All @@ -137,6 +140,7 @@ describe('formatSlackPayload', () => {
phase: 'ramp',
fingerprint: 'abc123',
count: 10,
periodCount: 5,
suppressedSince: 5,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand All @@ -158,6 +162,7 @@ describe('formatSlackPayload', () => {
phase: 'sustained',
fingerprint: 'abc123',
count: 200,
periodCount: 37,
suppressedSince: 0,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand All @@ -167,7 +172,8 @@ describe('formatSlackPayload', () => {
const payload = formatSlackPayload(alert)

const header = payload.attachments[0].blocks[0]
expect(header.text?.text).toContain('x200')
expect(header.text?.text).toContain('x37 since last update')
expect(header.text?.text).toContain('x200 total')
expect(header.text?.text).toContain('peak: 3.7/s')
})
})
Expand All @@ -180,6 +186,7 @@ describe('formatSlackPayload', () => {
phase: 'resolution',
fingerprint: 'abc123',
count: 50,
periodCount: 0,
suppressedSince: 0,
firstSeen: now - 3_600_000,
lastSeen: now,
Expand All @@ -200,6 +207,7 @@ describe('formatSlackPayload', () => {
phase: 'resolution',
fingerprint: 'abc123',
count: 1,
periodCount: 0,
suppressedSince: 0,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand Down
2 changes: 1 addition & 1 deletion src/adapters/slack/formatter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ export function formatSlackPayload(alert: FormattedAlert): SlackPayload {

case 'sustained': {
const title = truncate(
`${badge} [${alert.level.toUpperCase()}] ${alert.title} (x${aggregation.count} \u00B7 peak: ${aggregation.peakRate.toFixed(1)}/s)`,
`${badge} [${alert.level.toUpperCase()}] ${alert.title} (x${aggregation.periodCount} since last update \u00B7 x${aggregation.count} total \u00B7 peak: ${aggregation.peakRate.toFixed(1)}/s)`,
150,
)

Expand Down
1 change: 1 addition & 0 deletions src/adapters/slack/slack-adapter.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ function makeAlert(overrides: Partial<FormattedAlert> = {}): FormattedAlert {
phase: 'onset',
fingerprint: 'abc123',
count: 1,
periodCount: 0,
suppressedSince: 0,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand Down
8 changes: 7 additions & 1 deletion src/adapters/telegram/formatter.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ function makeAlert(overrides: Partial<FormattedAlert> = {}): FormattedAlert {
phase: 'onset',
fingerprint: 'abc123',
count: 1,
periodCount: 0,
suppressedSince: 0,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand Down Expand Up @@ -81,6 +82,7 @@ describe('formatTelegramMessage', () => {
phase: 'ramp',
fingerprint: 'abc123',
count: 10,
periodCount: 5,
suppressedSince: 5,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand All @@ -101,6 +103,7 @@ describe('formatTelegramMessage', () => {
phase: 'sustained',
fingerprint: 'abc123',
count: 200,
periodCount: 37,
suppressedSince: 0,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand All @@ -109,7 +112,8 @@ describe('formatTelegramMessage', () => {
})
const msg = formatTelegramMessage(alert)

expect(msg).toContain('x200')
expect(msg).toContain('x37 since last update')
expect(msg).toContain('x200 total')
expect(msg).toContain('peak: 3.7/s')
})
})
Expand All @@ -122,6 +126,7 @@ describe('formatTelegramMessage', () => {
phase: 'resolution',
fingerprint: 'abc123',
count: 50,
periodCount: 0,
suppressedSince: 0,
firstSeen: now - 3_600_000,
lastSeen: now,
Expand All @@ -142,6 +147,7 @@ describe('formatTelegramMessage', () => {
phase: 'resolution',
fingerprint: 'abc123',
count: 1,
periodCount: 0,
suppressedSince: 0,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand Down
2 changes: 1 addition & 1 deletion src/adapters/telegram/formatter.ts
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ export function formatTelegramMessage(alert: FormattedAlert): string {
case 'sustained': {
const emoji = SEVERITY_EMOJI[alert.level] ?? SEVERITY_EMOJI.info
parts.push(
`${emoji} <b>${badge} [${alert.level.toUpperCase()}] ${safeTitle} (x${aggregation.count} \u00B7 peak: ${aggregation.peakRate.toFixed(1)}/s)</b>`,
`${emoji} <b>${badge} [${alert.level.toUpperCase()}] ${safeTitle} (x${aggregation.periodCount} since last update \u00B7 x${aggregation.count} total \u00B7 peak: ${aggregation.peakRate.toFixed(1)}/s)</b>`,
)
parts.push('', safeMessage)
break
Expand Down
1 change: 1 addition & 0 deletions src/adapters/telegram/telegram-adapter.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ function makeAlert(overrides: Partial<FormattedAlert> = {}): FormattedAlert {
phase: 'onset',
fingerprint: 'abc123',
count: 1,
periodCount: 0,
suppressedSince: 0,
firstSeen: Date.now(),
lastSeen: Date.now(),
Expand Down
Loading
Loading