Introduction
Incident fundamentals: lifecycle, filtering, impact mapping, and communication tips.
Incidents represent service disruptions or degradations affecting one or more monitors. They let you aggregate related status events under a single human narrative, communicate impact clearly, and track resolution.
Lifecycle
- Detect issue (automated monitor event or manual observation).
- Create incident with clear scope and affected monitors.
- Post updates (events / descriptive changes) as investigation proceeds.
- Mark resolved once service is restored and root cause communicated.
Filtering & Search
- Search looks at the words in incident titles and descriptions to help you quickly narrow results.
- Monitor filtering: query incidents whose
monitorIds
array intersects selected monitors. - Status filtering:
ongoing
vsresolved
sets theresolved
boolean internally.
Impact Mapping
The UI resolves each monitor’s current status alongside incident context so you can see which components remain degraded mid‑resolution.
Best Practices
- Title: concise, user‑facing (“API Latency Degradation in EU Regions”).
- Description: initial scope + known symptoms; update as clarity improves.
- Avoid bundling unrelated issues; create separate incidents to maintain analytic accuracy.
- Resolve only after monitors are green and user‑visible confirmation is communicated.
Metrics & Reporting
Follow‑on reporting (MTTR, incident count per timeframe) is derived from stored start
, end
, and resolution flags. Future analytics may expand automatically.
Use maintenance windows for planned downtimes; incidents should remain reserved for unexpected impact.