Skip to main content
Nerve Centre KPIs · Audit Profile · Sentiment Settings Better Uptime state means nothing to a merchant unless it’s joined to revenue. This audit answers four questions: (1) is the storefront up and fast right now, (2) are the monitors that should be watching it actually wired to an on-call rota (coverage gaps, paused critical monitors, no-escalation monitors), (3) are we keeping the uptime SLA and resolving incidents quickly, and (4) when something IS down, how much money is on fire per minute while a commerce sibling is live.

What this audit checks

Authentication & access

  • API token valid (auth on GET /api/v2/monitors)
  • Region host correct (US uptime.betterstack.com / EU uptime.eu.betterstack.com)
  • Token scoped to the Uptime team (monitors list returns rows)

Monitor coverage (the blind-spot test)

  • Critical-path monitors present (checkout / cart / login / homepage)
  • Monitors paused for > 7 days (silent gaps in coverage)
  • Monitors with no escalation policy / on-call attached (fire silently)
  • Monitors in pending / validating state > 24h (never went live)
  • SSL certificate expiring within 14 days
  • Domain expiring within 30 days

Reliability & SLA health

  • Availability below SLA target (< 99.5% rolling 30D)
  • Open incidents older than 30 min without acknowledgement
  • Mean time to acknowledge > 15 min (slow on-call response)
  • Mean time to resolve > 60 min (slow recovery)
  • Services in degraded state sustained > 15 min
  • More than one service down concurrently (correlated outage)

Incident hygiene

  • Incidents resolved without acknowledgement (auto-resolved flapping)
  • Repeated incidents on the same monitor in 24h (noisy or unstable)
  • Top alerting monitors concentrating > 50% of incidents
  • Recurring error type / status code across multiple monitors

Cross-channel: revenue-at-risk (the killer area)

  • Monitor DOWN with sibling commerce connector live = compute $/min lost (commerce.revenue_per_min × downtime_minutes × estimated_traffic_loss_pct)
  • Checkout / cart monitor failing during peak commerce hours
  • Outage window overlapping a campaign push (sibling = google_ads / klaviyo) - paying for traffic that can’t land
  • Conversion drop during incident windows (vs 90D baseline)

Severity thresholds

SignalWarnCritical
availability_pct99.999.5
incidents_open_count13
mtta_ms300000900000
mttr_ms18000003600000
services_down_count11
services_degraded_count12
error_rate_pct12
p95_latency_ms8001500
monitors_paused_count13
monitors_no_escalation_count01
ssl_days_to_expiry147
domain_days_to_expiry3014

Data sources

  • GET https://uptime.betterstack.com/api/v2/monitors - Monitor inventory + status + check frequency + SSL/domain expiry
  • GET https://uptime.betterstack.com/api/v2/monitors/{monitor_id}/sla - Per-monitor availability + downtime + incident counts
  • GET https://uptime.betterstack.com/api/v2/monitors/{monitor_id}/response-times - Response-time series for apdex / latency / error-rate
  • GET https://uptime.betterstack.com/api/v2/incidents - Incident inventory + MTTA / MTTR + revenue-at-risk join
  • GET https://uptime.betterstack.com/api/v2/on-calls - On-call rota coverage
  • GET https://uptime.betterstack.com/api/v2/escalation-policies - Escalation wiring per monitor (no-escalation detection)
  • GET https://uptime.betterstack.com/api/v2/monitor-groups - Group rollups for service-health aggregation