> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vortexiq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# PagerDuty audit profile, Vortex IQ

> What the Vortex IQ PagerDuty health audit checks: PagerDuty: Routing Health, Response Performance & Revenue-at-Risk Paging

**[Nerve Centre KPIs](/nerve-centre/kpi-cards/pagerduty) · [Audit Profile](/nerve-centre/kpi-cards/pagerduty/audit) · [Sentiment Settings](/nerve-centre/kpi-cards/pagerduty/sentiment)**

A paging surface is only worth anything if the page actually lands and someone moves. This audit answers four questions: (1) is the REST token + every per-service routing key valid and mapped, (2) does every Vortex IQ severity tier route to a service with a real escalation policy and an always-on rota, (3) are we acknowledging and resolving fast enough (MTTA / MTTR / escalation rate, submission success, back-sync lag), and (4) when a sev1 incident is open, how much commerce revenue is on fire per minute it stays un-resolved.

## What this audit checks

### Authentication & access

* REST API token valid + has read scope (probe on /abilities)
* Region host correct (us → api.pagerduty.com, eu → api.eu.pagerduty.com)
* Every configured routing key resolves to an existing, active service
* Backup channel (Slack / Teams) configured so failures never silently un-page

### Routing & escalation coverage (the blind-spot test)

* Services with a revoked or rotated Events API v2 routing key (pages go nowhere)
* Vortex IQ severity tier (sev1 / sev2 / sev3) with no PagerDuty service mapped
* Sev1-mapped escalation policy without a 24/7 always-on rota
* Escalation policy with zero escalation steps (no fallback responder)
* On-call schedule with an uncovered gap in the next 24h

### Submission pipeline reliability

* Event submission success rate below 99% (events being dropped)
* Median submission latency above 5000ms (page delayed)
* Retried submissions (429 / 5xx) above 2σ vs 30D baseline
* Fail-open audit-logged events > 0 in last 24h (events the API could not accept)

### Response performance

* MTTA above 15 min (sev1 above 5 min)
* MTTR above 4h
* Escalation rate above 30% (first-line overloaded / under-staffed)
* Incident volume on any service above 2× the service average (noisy surface)

### Back-sync integration health

* Webhook back-sync lag above 300s (stale Vortex IQ incident timeline)
* Webhook subscription with a failed last delivery
* Webhook subscription toggled inactive (state changes stop flowing back silently)

### Cross-channel: revenue-at-risk paging (the killer area)

* Open sev1 incident with a sibling commerce connector live = compute \$/min lost (commerce.revenue\_per\_min × incident\_minutes × estimated\_traffic\_loss\_pct)
* Sev1 page un-acknowledged > 5 min during the peak trading window (worst-case missed page)
* Incident on a service whose name matches the commerce checkout / payment path (revenue-critical)
* Routing key revoked on the sev1 commerce-paging service (a silent un-page on the highest-value path)

## Severity thresholds

| Signal                      | Warn | Critical |
| --------------------------- | ---- | -------- |
| `event_success_rate_pct`    | 99.5 | 99       |
| `submission_latency_ms`     | 2000 | 5000     |
| `mtta_sec`                  | 600  | 900      |
| `mttr_sec`                  | 7200 | 14400    |
| `escalation_rate_pct`       | 20   | 30       |
| `routing_key_health_pct`    | 99   | 95       |
| `revoked_routing_key_count` | 1    | 1        |
| `unmapped_service_count`    | 1    | 1        |
| `schedule_gap_count`        | 1    | 1        |
| `webhook_backsync_lag_sec`  | 120  | 300      |
| `webhook_failure_count`     | 1    | 1        |
| `fail_open_event_count`     | 1    | 1        |

## Data sources

* `GET https://api.{region_host}/abilities` - Auth + scope sanity
* `GET https://api.{region_host}/incidents` - Incident inventory + ack/resolve timing + escalation count
* `GET https://api.{region_host}/services` - Service inventory + routing-key status + severity-tier mapping
* `GET https://api.{region_host}/escalation_policies` - Escalation policy depth + always-on rota presence
* `GET https://api.{region_host}/schedules` - On-call schedule coverage gaps
* `GET https://api.{region_host}/webhook_subscriptions` - Webhook delivery status + back-sync lag
* `POST https://api.{region_host}/analytics/metrics/incidents/all` - Aggregated MTTA / MTTR / escalation-rate
