Introduction
Overview of monitors: types, key settings, naming, intervals, and operational best practices.
Monitors continuously check the health and performance of your services (APIs, endpoints, network targets, certificates, schedulers, etc.). The Monitors area is your operational source of truth: add, review, filter, and optimize every check you rely on for reliability.
What You Can Do Here
Task | Where |
---|---|
Create a new monitor | New / Create Monitor button |
See current status & last check | Monitors list columns |
Open detail (history, latency, incidents) | Click monitor name |
Edit configuration | Actions → Edit |
Pause / Resume (if available) | Actions menu |
Monitor Types (Overview)
Type | Typical Use | Link |
---|---|---|
HTTP(S) | Web / API availability & latency | HTTP |
ICMP | Basic network reachability (ping) | ICMP |
TCP | Port-level availability (e.g., DB port) | TCP |
UDP | Services using UDP protocols | UDP |
Heartbeat / Cron | Expected regular signals from jobs | Heartbeat / Cron |
Manual | Track external / non-automated components | Manual |
SMTP | Mail server responsiveness | SMTP |
SSL Certificate | Expiry & validity of certs | SSL Certificate |
DNS | DNS record reachability/consistency | DNS |
Each type page includes specific configuration guidance and best practices.
Quick Start (5 Minutes)
- Choose appropriate type (usually HTTP for first monitor).
- Enter address / target.
- Set interval (start with a balanced value like 60s for production, longer for non-critical).
- Configure retries (avoid false alarms—e.g., 3 retries).
- Add at least one integration (Slack + Email recommended).
- Save and confirm first successful check.
Next: Add a second monitor for a critical dependency (e.g., database port via TCP) to broaden visibility.
Status & Health Indicators
Indicator | Meaning | Typical Follow-Up |
---|---|---|
Up | All recent checks succeeded | None |
Degraded (if shown) | Partial success or latency issues | Investigate performance |
Down | Consecutive failures exceeded retries | Triage / create incident |
Paused | Not currently checking | Resume when ready |
Naming & Organization
Aspect | Recommendation |
---|---|
Name | service-component (env) e.g. api-gateway (prod) |
Grouping (if available) | Separate prod vs staging vs internal |
Tags (if available) | severity:tier1, team:payments |
Consistent naming accelerates filtering and alert routing.
Right-Sizing Intervals
Scenario | Suggested Interval |
---|---|
External customer API | 30–60s |
Internal microservice | 60–120s |
Cron / Heartbeat job hourly | Expect heartbeat; no polling |
SSL certificate expiry | 6–12h |
Too-frequent checks create noise; too-infrequent checks increase detection delay.
Reducing Alert Noise
- Use retries to filter transient blips.
- Add maintenance windows for planned changes.
- Group informational monitors without urgent alert channels.
- Periodically review “Monitors without Integrations” and “Monitors without Maintenance” metrics.
Security & Safety
- Avoid embedding credentials in URLs; use headers or auth fields.
- Keep tokens rotated and scoped minimally.
- For endpoints requiring auth, provide a lightweight health-specific route.
When to Add More Monitors
- New critical user-facing feature launches.
- Recurring incidents reveal unmonitored dependency.
- SLA / compliance reporting requires explicit tracking.
When to Consolidate
- Overlapping monitors measuring identical targets.
- Excess latency graphs adding little signal.
Next Steps
Proceed to Create a Monitor or dive into a specific type above.
If you are expanding beyond core availability, explore: integrating alert channels, adding status pages, and setting compliance targets for uptime reporting.