What this audit checks
Authentication & access
- API token valid (auth on GET /api/v2/monitors)
- Region host correct (US uptime.betterstack.com / EU uptime.eu.betterstack.com)
- Token scoped to the Uptime team (monitors list returns rows)
Monitor coverage (the blind-spot test)
- Critical-path monitors present (checkout / cart / login / homepage)
- Monitors paused for > 7 days (silent gaps in coverage)
- Monitors with no escalation policy / on-call attached (fire silently)
- Monitors in pending / validating state > 24h (never went live)
- SSL certificate expiring within 14 days
- Domain expiring within 30 days
Reliability & SLA health
- Availability below SLA target (< 99.5% rolling 30D)
- Open incidents older than 30 min without acknowledgement
- Mean time to acknowledge > 15 min (slow on-call response)
- Mean time to resolve > 60 min (slow recovery)
- Services in degraded state sustained > 15 min
- More than one service down concurrently (correlated outage)
Incident hygiene
- Incidents resolved without acknowledgement (auto-resolved flapping)
- Repeated incidents on the same monitor in 24h (noisy or unstable)
- Top alerting monitors concentrating > 50% of incidents
- Recurring error type / status code across multiple monitors
Cross-channel: revenue-at-risk (the killer area)
- Monitor DOWN with sibling commerce connector live = compute $/min lost (commerce.revenue_per_min × downtime_minutes × estimated_traffic_loss_pct)
- Checkout / cart monitor failing during peak commerce hours
- Outage window overlapping a campaign push (sibling = google_ads / klaviyo) - paying for traffic that can’t land
- Conversion drop during incident windows (vs 90D baseline)
Severity thresholds
| Signal | Warn | Critical |
|---|---|---|
availability_pct | 99.9 | 99.5 |
incidents_open_count | 1 | 3 |
mtta_ms | 300000 | 900000 |
mttr_ms | 1800000 | 3600000 |
services_down_count | 1 | 1 |
services_degraded_count | 1 | 2 |
error_rate_pct | 1 | 2 |
p95_latency_ms | 800 | 1500 |
monitors_paused_count | 1 | 3 |
monitors_no_escalation_count | 0 | 1 |
ssl_days_to_expiry | 14 | 7 |
domain_days_to_expiry | 30 | 14 |
Data sources
GET https://uptime.betterstack.com/api/v2/monitors- Monitor inventory + status + check frequency + SSL/domain expiryGET https://uptime.betterstack.com/api/v2/monitors/{monitor_id}/sla- Per-monitor availability + downtime + incident countsGET https://uptime.betterstack.com/api/v2/monitors/{monitor_id}/response-times- Response-time series for apdex / latency / error-rateGET https://uptime.betterstack.com/api/v2/incidents- Incident inventory + MTTA / MTTR + revenue-at-risk joinGET https://uptime.betterstack.com/api/v2/on-calls- On-call rota coverageGET https://uptime.betterstack.com/api/v2/escalation-policies- Escalation wiring per monitor (no-escalation detection)GET https://uptime.betterstack.com/api/v2/monitor-groups- Group rollups for service-health aggregation