What this audit checks
Authentication & access
- API token valid (auth on /api/3.1/credits)
- X-Pingdom-Account header correct for multi-user sub-accounts
- Check / TMS quota headroom > 15% (credits endpoint)
- Token scope covers checks + actions + summary endpoints
Uptime & SLA
- Any check fully down (status = down / unconfirmed_down)
- Region-specific outage - probe failing only from some locations (CDN / DNS issue)
- Uptime below SLA target (default < 99.5%) over the period
- Degraded checks - slow or intermittent but not fully down
- Paused checks not re-enabled after > 7 days (silent blind spot)
Performance & latency
- Average response time above 1500ms sustained
- p95 response time above 1500ms
- p99 response time above 3000ms
- Apdex below 0.85 (frustrated-experience threshold)
- Throughput dropped > 30% vs prior period (capacity / outage signal)
Incident response & coverage (the blind-spot test)
- MTTA above 15 min (alerts firing but nobody acknowledging)
- MTTR above 60 min (slow recovery)
- Checks without a notification / integration channel (fires silently)
- Checks flapping repeatedly in 24h (noisy threshold / real instability)
- Critical path uncovered - no check on the storefront / checkout URL
Cross-channel: revenue-at-risk (the killer area)
- Storefront check down with sibling commerce connector live = compute $/min lost (commerce.revenue_per_min × down_minutes × estimated_traffic_loss_pct)
- Checkout / cart URL check degraded (p95 > 3s) during peak hours
- Probe failing in a region during a campaign push (sibling = google_ads / klaviyo) - paying for traffic that can’t load the page
- Conversion drop during outage windows (vs 90D commerce baseline)
Severity thresholds
| Signal | Warn | Critical |
|---|---|---|
apdex | 0.9 | 0.85 |
error_rate_pct | 1 | 2 |
avg_response_ms | 1000 | 1500 |
p95_latency_ms | 1000 | 1500 |
p99_latency_ms | 1500 | 3000 |
sla_compliance_pct | 99.9 | 99.5 |
services_down_count | 0 | 1 |
services_degraded_count | 1 | 2 |
incidents_open_count | 1 | 3 |
mtta_ms | 300000 | 900000 |
mttr_ms | 1800000 | 3600000 |
throughput_change_pct_vsP | -15 | -30 |
Data sources
GET https://api.pingdom.com/api/3.1/credits- Auth + token sanity + quota headroomGET https://api.pingdom.com/api/3.1/checks- Check inventory + current up/down/paused states + tagsGET https://api.pingdom.com/api/3.1/summary.performance/{checkid}- Response-time percentiles + uptime per checkGET https://api.pingdom.com/api/3.1/summary.outage/{checkid}- Outage windows for SLA compliance + incident durationGET https://api.pingdom.com/api/3.1/actions- Alert events for MTTA + top-alerting + acknowledgementGET https://api.pingdom.com/api/3.1/results/{checkid}- Raw probe results for top error-type clusteringGET https://api.pingdom.com/api/3.1/tms/checks- Transaction (multi-step) checks for critical-path coverage