Create an Incident
Open a new incident, assign affected monitors, and begin structured updates.
Creating an incident lets you communicate unexpected downtime or degradation in a structured timeline.
When to Open an Incident
- User-visible impact (latency, errors, partial outage)
- Degradation expected to persist beyond recovery threshold
- Security or compliance events (with limited disclosure as needed)
Form Fields
Field | Description | Guidance |
---|---|---|
Title | Short external summary | Avoid blame; focus on symptom |
Description | Initial context / impact statement | Update as investigation evolves |
Affected Monitors | List of impacted components | Select only those directly affected |
Impact Status | Severity classification (degraded/outage) | Keep consistent taxonomy |
Start Time | When impact began | Use earliest known time (estimates okay) |
End Time | When resolved | Leave empty until confirmed |
Workflow Tips
- Open early with limited detail (transparency > silence).
- Update regularly (15–30 min cadence) even if still investigating.
- Mark resolved only after all monitors stable & mitigation verified.
- Post‑resolution: add final RCA summary (what happened, user impact, actions).
Avoid Noise
- Do NOT open incidents for planned maintenance (use Maintenance feature).
- Combine only tightly related symptoms; separate unrelated failures.
Aftercare
- Review mean time to detect (MTTD) & mean time to resolve (MTTR) metrics.
- Feed lessons into runbooks and monitor coverage improvements.