Skip to content

Latest commit

 

History

History
245 lines (183 loc) · 7.51 KB

File metadata and controls

245 lines (183 loc) · 7.51 KB

Testing Status

✅ Unit Tests (25 tests, all passing)

cargo test --lib

Test Coverage:

State Machine Tests (8 tests)

  • ✅ All valid state transitions (Declared → Investigating → Identified → Monitoring → Resolved)
  • ✅ All invalid transitions blocked
  • ✅ Terminal state detection (Resolved cannot transition)
  • ✅ Severity parsing (P1/P2/P3/P4, case-insensitive)
  • ✅ Severity labels ("P1 (Critical)", etc.)
  • ✅ Severity emojis (🔴 P1, 🟡 P2, 🟢 P3/P4)

Channel Naming Tests (5 tests)

  • ✅ Standard naming (inc-YYYYMMDD-service)
  • ✅ Date formatting (zero-padded months/days)
  • ✅ Special character filtering (alphanumeric + hyphens only)
  • ✅ Long service name handling (truncate or use UUID)
  • ✅ Empty service name fallback (UUID-based)

Slack Signature Verification (2 tests)

  • ✅ Valid HMAC signature acceptance
  • ✅ Invalid signature rejection

All unit tests pass with 0 failures.


✅ Integration Tests (8 tests, all passing with PostgreSQL)

# Requires PostgreSQL running
cargo test --test incident_lifecycle_test

Test Coverage:

Incident Lifecycle Tests

  • test_create_incident - Creates incident, verifies DB state, timeline logged
  • test_post_status_update - Commander posts status, timeline updated
  • test_non_commander_cannot_update - Permission denied for non-commander
  • test_change_severity - Severity escalation, timeline logged
  • test_resolve_incident - Resolution, duration calculated, timeline logged
  • test_resolve_idempotent - Multiple resolve calls succeed
  • test_full_incident_lifecycle - Full flow (declare → status → escalate → status → resolve)

Status: Passing locally when PostgreSQL is available and DATABASE_URL is configured.

Running Integration Tests

# 1. Start PostgreSQL
docker run -d \
  --name incident-bot-test-db \
  -e POSTGRES_USER=incident_bot \
  -e POSTGRES_PASSWORD=password \
  -e POSTGRES_DB=incident_bot_test \
  -p 5433:5432 \
  postgres:16-alpine

# 2. Set DATABASE_URL for tests
export DATABASE_URL=postgres://incident_bot:password@localhost:5433/incident_bot_test

# 3. Run migrations
sqlx migrate run --database-url $DATABASE_URL

# 4. Run tests
cargo test --test incident_lifecycle_test

Manual QA Checklist

Prerequisites

  • Slack app configured with all scopes
  • Bot installed to workspace
  • .env file configured
  • PostgreSQL running
  • Ngrok tunnel active
  • Bot server running (cargo run)

Test 1: P1 Full Lifecycle (30 min)

  1. Declare Incident

    • Run /incident declare in Slack
    • Modal appears with title/severity/service/commander fields
    • Fill: Title="Test P1", Severity=P1, Service=Okta
    • Submit
    • Channel created: #inc-YYYYMMDD-okta
    • Commander invited to channel
    • Details pinned to channel
    • #general receives notification
    • Configured execs receive DM
  2. Post Status Updates

    • In incident channel: /incident status Investigating root cause
    • Status posted to channel
    • Commander sees acknowledgment
    • Non-commander attempts status update → permission denied
  3. Change Severity

    • /incident severity P2 Impact decreased
    • Severity change posted to channel
    • No new escalation notifications (downgrade)
  4. Escalate Back to P1

    • /incident severity P1 Impact increased again
    • Escalation posted to channel
    • #general receives escalation notification
    • Execs receive new DMs
  5. Resolve Incident

    • /incident resolved
    • Resolution posted to channel with duration (e.g., "1h 35min")
    • #general receives resolution notification
  6. View Timeline

    • /incident timeline
    • Timeline displays all events chronologically
    • Timestamps match incident channel messages
  7. Generate Postmortem

    • /incident postmortem
    • Markdown template posted as code block
    • Timeline auto-filled
    • Placeholders for root cause/action items present

Test 2: P3 Minimal Incident (10 min)

  1. Declare P3

    • /incident declare → Severity=P3
    • Channel created
    • NO #general notification
    • NO exec DMs
  2. Resolve P3

    • /incident resolved
    • Resolution posted to incident channel only
    • NO broad notifications

Test 3: Permission Validation (5 min)

  1. Non-Commander Attempts

    • User B tries /incident status in User A's incident → denied
    • User B tries /incident severity P1 → denied
    • User B tries /incident resolved → denied
  2. Commander-Only Operations

    • Commander can post status ✅
    • Commander can change severity ✅
    • Commander can resolve ✅
  3. Everyone Can View

    • Anyone can run /incident timeline
    • Anyone can run /incident postmortem (if resolved) ✅

Test 4: Error Handling (10 min)

  1. No Active Incident

    • Run /incident status in non-incident channel → "No active incident in this channel"
  2. Empty Message

    • /incident status (no message) → "Status message cannot be empty"
  3. Invalid Severity

    • /incident severity P5 → "Invalid severity. Use P1, P2, P3, or P4"
  4. Already Resolved

    • Resolve incident
    • Try /incident status → "Invalid state transition" or similar
    • Try /incident resolved again → succeeds idempotently
  5. Postmortem Before Resolved

    • In active incident, run /incident postmortem → "Incident must be resolved first"

Test 5: Database State (5 min)

Query database after Test 1 complete:

-- Check incident record
SELECT id, title, severity, status, commander_id, resolved_at, duration_minutes
FROM incidents
WHERE title = 'Test P1';

-- Check timeline (should have 5+ events)
SELECT event_type, message, posted_by, timestamp
FROM incident_timeline
WHERE incident_id = 'incident-id-from-above'
ORDER BY timestamp;

-- Check notifications sent
SELECT notification_type, recipient, status
FROM incident_notifications
WHERE incident_id = 'incident-id-from-above';

-- Check audit log
SELECT action, actor_id, timestamp
FROM audit_log
WHERE incident_id = 'incident-id-from-above'
ORDER BY timestamp;

Expected:

  • Incident record exists with correct final state
  • Timeline has all events (declared, status×2, severity×2, resolved)
  • Notifications logged (P1 declaration, escalation, resolution)
  • Audit log complete (all actions logged with actors)

Test Summary

Unit Tests: ✅ 25/25 passing

Integration Tests: ✅ 8/8 passing (with PostgreSQL test database)

Manual QA: 📋 Checklist ready for staging workspace testing

Code Coverage: Not currently measured in CI. Core state-machine, permission, and timeline flows are covered by tests.


Next Steps

  1. For CI/CD: Use testcontainers to auto-provision PostgreSQL in CI
  2. For Local Dev: Document database setup in QUICKSTART.md (already done)
  3. For Production: Run full manual QA checklist before deployment

Known Test Gaps

  • Slack API integration - Not tested (would require wiremock setup or real Slack workspace)
  • Notification throttling - Needs time-based tests (5-min window)
  • Channel deduplication - Needs Slack API mocking
  • Concurrent incident creation - Needs load testing setup

These gaps are acceptable for current scope. The core business logic (state machine, permissions, timeline) is fully tested.