PagerDuty audit profile, Vortex IQ - Vortex IQ Help Centre

Nerve Centre KPIs · Audit Profile · Sentiment Settings A paging surface is only worth anything if the page actually lands and someone moves. This audit answers four questions: (1) is the REST token + every per-service routing key valid and mapped, (2) does every Vortex IQ severity tier route to a service with a real escalation policy and an always-on rota, (3) are we acknowledging and resolving fast enough (MTTA / MTTR / escalation rate, submission success, back-sync lag), and (4) when a sev1 incident is open, how much commerce revenue is on fire per minute it stays un-resolved.

What this audit checks

Authentication & access

REST API token valid + has read scope (probe on /abilities)
Region host correct (us → api.pagerduty.com, eu → api.eu.pagerduty.com)
Every configured routing key resolves to an existing, active service
Backup channel (Slack / Teams) configured so failures never silently un-page

Services with a revoked or rotated Events API v2 routing key (pages go nowhere)
Vortex IQ severity tier (sev1 / sev2 / sev3) with no PagerDuty service mapped
Sev1-mapped escalation policy without a 24/7 always-on rota
Escalation policy with zero escalation steps (no fallback responder)
On-call schedule with an uncovered gap in the next 24h

Submission pipeline reliability

Event submission success rate below 99% (events being dropped)
Median submission latency above 5000ms (page delayed)
Retried submissions (429 / 5xx) above 2σ vs 30D baseline
Fail-open audit-logged events > 0 in last 24h (events the API could not accept)

Response performance

MTTA above 15 min (sev1 above 5 min)
MTTR above 4h
Escalation rate above 30% (first-line overloaded / under-staffed)
Incident volume on any service above 2× the service average (noisy surface)

Back-sync integration health

Webhook back-sync lag above 300s (stale Vortex IQ incident timeline)
Webhook subscription with a failed last delivery
Webhook subscription toggled inactive (state changes stop flowing back silently)

Cross-channel: revenue-at-risk paging (the killer area)

Open sev1 incident with a sibling commerce connector live = compute $/min lost (commerce.revenue_per_min × incident_minutes × estimated_traffic_loss_pct)
Sev1 page un-acknowledged > 5 min during the peak trading window (worst-case missed page)
Incident on a service whose name matches the commerce checkout / payment path (revenue-critical)
Routing key revoked on the sev1 commerce-paging service (a silent un-page on the highest-value path)

Severity thresholds

Signal	Warn	Critical
`event_success_rate_pct`	99.5	99
`submission_latency_ms`	2000	5000
`mtta_sec`	600	900
`mttr_sec`	7200	14400
`escalation_rate_pct`	20	30
`routing_key_health_pct`	99	95
`revoked_routing_key_count`	1	1
`unmapped_service_count`	1	1
`schedule_gap_count`	1	1
`webhook_backsync_lag_sec`	120	300
`webhook_failure_count`	1	1
`fail_open_event_count`	1	1

Data sources

GET https://api.{region_host}/abilities - Auth + scope sanity
GET https://api.{region_host}/incidents - Incident inventory + ack/resolve timing + escalation count
GET https://api.{region_host}/services - Service inventory + routing-key status + severity-tier mapping
GET https://api.{region_host}/escalation_policies - Escalation policy depth + always-on rota presence
GET https://api.{region_host}/schedules - On-call schedule coverage gaps
GET https://api.{region_host}/webhook_subscriptions - Webhook delivery status + back-sync lag
POST https://api.{region_host}/analytics/metrics/incidents/all - Aggregated MTTA / MTTR / escalation-rate

​What this audit checks

​Authentication & access

​Routing & escalation coverage (the blind-spot test)

​Submission pipeline reliability

​Response performance

​Back-sync integration health

​Cross-channel: revenue-at-risk paging (the killer area)

​Severity thresholds

​Data sources