Incidents in Detail
A comprehensive guide to managing incidents in Garmingo Status.
Deep dive into lifecycle, communication, and post‑incident improvements.
Lifecycle States (Typical)
State | Purpose | Guidance |
---|---|---|
Investigating | Awareness + initial triage | Acknowledge quickly even if cause unknown |
Identified | Root cause known | Provide scope + next action |
Monitoring | Fix applied, verifying | Set expectation for final update |
Resolved | Fully fixed | Summarize impact + prevention |
Postmortem (optional tag) | Analysis phase | Link internal doc |
Creating High-Quality Initial Incident
Include: summary, impacted user scope, symptoms (errors, latency), timestamp of first detection (if known), immediate action.
Update Cadence
Severity | Minimum Update Interval |
---|---|
Critical | 15 min |
High | 30 min |
Medium | 60 min |
Low | At meaningful changes |
Set a reminder to avoid silent gaps.
Impact Section
Answer: Who is affected? What can users do? Are there workarounds?
Root Cause & Resolution
Fill after stabilization. Avoid blame; focus on chain of events and remediation.
Postmortem Checklist
- Timeline assembled
- Primary/root cause distinguished
- Preventative actions logged & assigned
- User communications archived
Linking Monitors
Attach relevant monitors for automatic status indicators and faster filtering later.
See also: Filters for narrowing large histories.