Incident ID: [Date-Title] Date: [YYYY-MM-DD] Authors: [Names] Status: [Draft / Review / Final]
Briefly describe what happened, the impact (users affected, downtime), and the severity.
Detailed timeline of events (UTC).
- [HH:MM] - Issue detected via [Monitoring/User Report].
- [HH:MM] - Investigation started.
- [HH:MM] - Root cause identified as [cause].
- [HH:MM] - Fix applied [describe fix].
- [HH:MM] - Service restored.
The fundamental reason the incident occurred.
- Direct Cause: [What triggered the failure?]
- Underlying Cause: [Why was the system vulnerable? e.g., configuration error, code bug, capacity limit]
How was the immediate issue fixed?
Action items to prevent recurrence.
| ID | Action Item | Owner | Priority | Status |
|---|---|---|---|---|
| 1 | [e.g., Update alert thresholds] | [Name] | High | TODO |
| 2 | [e.g., Fix bug in code] | [Name] | High | TODO |
| 3 | [e.g., Update documentation] | [Name] | Medium | TODO |