Add severity-tiered incident-response escalation ladder

## Goal
Add a severity-tiered incident-response escalation ladder so urgent failures route differently from routine retries. Today the only escalation path is a Telegram alert with `/retry` and `/close` buttons; severity is undifferentiated.

## Success Criteria
- New `orchestrator/incident_router.py` defines severity tiers (`sev1` immediate page, `sev2` next-business-hour, `sev3` regular digest) and a routing config in `config.yaml` (e.g. `incident_routing: { sev1: { telegram_chat: ops, snooze_minutes: 0 }, sev2: { ... } }`)
- Existing escalation sites (`pr_monitor` red-CI, `queue` exhausted-fallback, `agent_scorer` SLO burn, `deploy_watchdog` regression) classify their event into a severity and route via `incident_router.escalate(severity, event)` instead of calling `_send_telegram` directly
- `sev1` events bypass kill-switch (they ARE the kill signal) and include a generated runbook link or inline checklist
- `sev2`/`sev3` events are deduplicated within a configurable window so the same recurring incident does not page repeatedly
- All routed incidents persisted to `runtime/incidents/incidents.jsonl` with `{id, sev, source, event, ack_at, resolved_at}` so `/ack <id>` and `/resolve <id>` Telegram commands can close them
- Regression test: synthetic sev1 routes immediately and bypasses dedup; synthetic sev3 dedups within window

## Constraints
- Severity classification must be deterministic from event metadata — no LLM call inside the router (latency/cost)
- Existing `/retry` and `/close` button flows must continue to work for backwards compatibility during migration
- Default config should be conservative: nothing is sev1 unless a repo explicitly opts in

## Task Type
architecture

## Why
Today every escalation looks the same: a Telegram message. A genuinely urgent regression and a routine missing-context blocker arrive on the same channel with the same priority. As more agents come online (deploy watchdog, SLO tracker, dependency watcher), the operator inbox will become unscannable without an escalation tier model.

## Re-queued Context
- Original issue: https://github.com/kai-linux/agent-os/issues/248
- Escalated task: task-20260421-121135-add-severity-tiered-incident-response-escalation-l

### Last agent summary
Rendered prompt is 134078 bytes, exceeding the 100000-byte ceiling.

### Blockers
- Prompt size 134078 bytes exceeds 100000-byte limit.
- Retrying with more prior-attempt context will not help; the task body itself must be trimmed.

### Files changed
- None

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add severity-tiered incident-response escalation ladder #286

Goal

Success Criteria

Constraints

Task Type

Why

Re-queued Context

Last agent summary

Blockers

Files changed

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add severity-tiered incident-response escalation ladder #286

Description

Goal

Success Criteria

Constraints

Task Type

Why

Re-queued Context

Last agent summary

Blockers

Files changed

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions