Better Uptime audit profile, Vortex IQ

Nerve Centre KPIs · Audit Profile · Sentiment Settings Better Uptime state means nothing to a merchant unless it’s joined to revenue. This audit answers four questions: (1) is the storefront up and fast right now, (2) are the monitors that should be watching it actually wired to an on-call rota (coverage gaps, paused critical monitors, no-escalation monitors), (3) are we keeping the uptime SLA and resolving incidents quickly, and (4) when something IS down, how much money is on fire per minute while a commerce sibling is live.

What this audit checks

Authentication & access

API token valid (auth on GET /api/v2/monitors)
Region host correct (US uptime.betterstack.com / EU uptime.eu.betterstack.com)
Token scoped to the Uptime team (monitors list returns rows)

Critical-path monitors present (checkout / cart / login / homepage)
Monitors paused for > 7 days (silent gaps in coverage)
Monitors with no escalation policy / on-call attached (fire silently)
Monitors in pending / validating state > 24h (never went live)
SSL certificate expiring within 14 days
Domain expiring within 30 days

Reliability & SLA health

Availability below SLA target (< 99.5% rolling 30D)
Open incidents older than 30 min without acknowledgement
Mean time to acknowledge > 15 min (slow on-call response)
Mean time to resolve > 60 min (slow recovery)
Services in degraded state sustained > 15 min
More than one service down concurrently (correlated outage)

Incident hygiene

Incidents resolved without acknowledgement (auto-resolved flapping)
Repeated incidents on the same monitor in 24h (noisy or unstable)
Top alerting monitors concentrating > 50% of incidents
Recurring error type / status code across multiple monitors

Cross-channel: revenue-at-risk (the killer area)

Monitor DOWN with sibling commerce connector live = compute $/min lost (commerce.revenue_per_min × downtime_minutes × estimated_traffic_loss_pct)
Checkout / cart monitor failing during peak commerce hours
Outage window overlapping a campaign push (sibling = google_ads / klaviyo) - paying for traffic that can’t land
Conversion drop during incident windows (vs 90D baseline)

Severity thresholds

Signal	Warn	Critical
`availability_pct`	99.9	99.5
`incidents_open_count`	1	3
`mtta_ms`	300000	900000
`mttr_ms`	1800000	3600000
`services_down_count`	1	1
`services_degraded_count`	1	2
`error_rate_pct`	1	2
`p95_latency_ms`	800	1500
`monitors_paused_count`	1	3
`monitors_no_escalation_count`	0	1
`ssl_days_to_expiry`	14	7
`domain_days_to_expiry`	30	14

Data sources

GET https://uptime.betterstack.com/api/v2/monitors - Monitor inventory + status + check frequency + SSL/domain expiry
GET https://uptime.betterstack.com/api/v2/monitors/{monitor_id}/sla - Per-monitor availability + downtime + incident counts
GET https://uptime.betterstack.com/api/v2/monitors/{monitor_id}/response-times - Response-time series for apdex / latency / error-rate
GET https://uptime.betterstack.com/api/v2/incidents - Incident inventory + MTTA / MTTR + revenue-at-risk join
GET https://uptime.betterstack.com/api/v2/on-calls - On-call rota coverage
GET https://uptime.betterstack.com/api/v2/escalation-policies - Escalation wiring per monitor (no-escalation detection)
GET https://uptime.betterstack.com/api/v2/monitor-groups - Group rollups for service-health aggregation

​What this audit checks

​Authentication & access

​Monitor coverage (the blind-spot test)

​Reliability & SLA health

​Incident hygiene

​Cross-channel: revenue-at-risk (the killer area)

​Severity thresholds

​Data sources