What this audit checks
Authentication & access
- API key valid (auth on GET /v1/pages - returns the account’s pages)
- page_id resolves to a page the key can read
- Key has read scope on components, incidents, and metrics
Status-page truthfulness (the customer-trust test)
- Components in partial_outage / major_outage with NO open incident published (page is lying)
- Open incident referencing components that are all operational (stale incident, page over-reporting)
- Components stuck in under_maintenance > 24h (forgotten maintenance window)
- only_show_if_degraded components that are degraded but hidden from the public page
- System metrics with no data in > 60 min (stale performance display)
Incident hygiene
- Incidents in investigating state > 30 min without an identified/monitoring update
- Mean time to acknowledge > 15 min (slow first update)
- Mean time to resolve > 60 min (slow recovery)
- Repeated incidents on the same component in 24h (noisy or unstable)
- Top alerting components concentrating > 50% of incidents
- Major/critical-impact incidents resolved without a postmortem
Reliability & SLA health
- Component-group availability below SLA target (< 99.5% rolling 30D)
- More than one component in major_outage concurrently (correlated outage)
- Components in degraded_performance sustained > 15 min
- Apdex below 0.85 / p95 above 1500ms on published system metrics
- Error rate above 2% on published system metrics
Cross-channel: revenue-at-risk (the killer area)
- Component in major_outage with sibling commerce connector live = compute $/min lost (commerce.revenue_per_min × outage_minutes × estimated_traffic_loss_pct)
- Checkout / cart component degraded or down during peak commerce hours
- Outage window overlapping a campaign push (sibling = google_ads / klaviyo) - paying for traffic that lands on a broken page
- Conversion drop during published-incident windows (vs 90D baseline)
Severity thresholds
| Signal | Warn | Critical |
|---|---|---|
availability_pct | 99.9 | 99.5 |
incidents_open_count | 1 | 3 |
mtta_ms | 300000 | 900000 |
mttr_ms | 1800000 | 3600000 |
components_major_outage_count | 1 | 1 |
components_degraded_count | 1 | 2 |
untruthful_components_count | 0 | 1 |
error_rate_pct | 1 | 2 |
p95_latency_ms | 800 | 1500 |
apdex | 0.9 | 0.85 |
metric_staleness_min | 30 | 60 |
Data sources
GET https://api.statuspage.io/v1/pages- Auth + page inventoryGET https://api.statuspage.io/v1/pages/{page_id}/components- Component status inventory (healthy/degraded/down truthfulness)GET https://api.statuspage.io/v1/pages/{page_id}/component-groups- Group availability rollups for SLAGET https://api.statuspage.io/v1/pages/{page_id}/incidents- Incident inventory + MTTA / MTTR + revenue-at-risk joinGET https://api.statuspage.io/v1/pages/{page_id}/incidents/unresolved- Currently-open incidents for truthfulness cross-checkGET https://api.statuspage.io/v1/pages/{page_id}/metrics- Published system metrics (apdex / latency / error-rate / throughput)GET https://api.statuspage.io/v1/pages/{page_id}/metrics/{metric_id}/data- Metric data series for freshness + threshold checks