> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vortexiq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Opsgenie audit profile, Vortex IQ

> What the Vortex IQ Opsgenie health audit checks: Opsgenie: Coverage, Response Speed, SLA & Revenue-at-Risk

**[Nerve Centre KPIs](/nerve-centre/kpi-cards/opsgenie) · [Audit Profile](/nerve-centre/kpi-cards/opsgenie/audit) · [Sentiment Settings](/nerve-centre/kpi-cards/opsgenie/sentiment)**

Opsgenie alert and incident state means little to a merchant unless it's joined to the revenue those services protect. This audit answers four questions: (1) is the API key still good and the alerts / incidents / services readable, (2) is the on-call process actually covering the alerts that fire (un-acknowledged alerts, no-routing gaps, noisy services), (3) are we acknowledging and resolving fast enough to hold SLA (MTTA / MTTR / SLA compliance), and (4) when a service IS on fire, how much money is on fire per minute when it fronts a commerce-critical path?

## What this audit checks

### Authentication & access

* API key valid (auth on /v2/account) and not revoked
* Region host correct (US = api.opsgenie.com / EU = api.eu.opsgenie.com)
* Key has read scope on Alerts, Incidents, and Services
* Request-quota headroom > 15% (429 / Retry-After avoidance)

### Alert coverage (the blind-spot test)

* Open alerts un-acknowledged > 30 min (no on-call pickup)
* Alerts with no responder / routing rule match (fires into the void)
* P1 / P2 alerts un-acknowledged at all (highest-severity coverage gap)
* Services with sustained alert volume but no declared incident (noise drowning signal)
* Alerts auto-closed without acknowledgement (silent dismissals)

### Response speed & SLA health

* MTTA above 5 min sustained (acknowledgement lag = routing / coverage problem)
* MTTR above 60 min sustained (resolution lag = capacity problem)
* SLA compliance below 99.5% (reliability commitment slipping)
* Incidents open > 0 with no update in last 30 min (stalled response)
* Apdex below 0.85 / error rate > 2% / p95 > 1500ms on a tracked service

### Alert economics & noise

* Top alerting service alert volume > 2σ vs its 30-day baseline (noise spike)
* Recurring error-type cluster trending up (fix-at-source candidate)
* Flapping alerts (open -> close -> open > 3 times in 24h on same alias)
* Throughput on a tracked service dropped > 30% WoW (capacity / outage signal)

### Cross-channel: revenue-at-risk (the killer area)

* Open incident whose impactedServices intersect a commerce sibling's checkout / payment service = compute \$/min lost (commerce.revenue\_per\_min × incident\_minutes × estimated\_traffic\_loss\_pct)
* Alert storm (> 10 alerts/h) on a service that fronts checkout / payments / search during peak hours
* Alert spike on a commerce-critical service during a campaign push (sibling = google\_ads / amazon\_ads / klaviyo) - paying for traffic that can't convert
* MTTR degradation on commerce-critical services correlated with a sibling commerce conversion / abandonment regression

## Severity thresholds

| Signal                              | Warn | Critical |
| ----------------------------------- | ---- | -------- |
| `alerts_unacknowledged_30min_count` | 1    | 5        |
| `p1_p2_unacknowledged_count`        | 0    | 1        |
| `mtta_seconds`                      | 300  | 600      |
| `mttr_seconds`                      | 3600 | 7200     |
| `sla_compliance_pct`                | 99.9 | 99.5     |
| `incidents_open_count`              | 1    | 3        |
| `services_degraded_count`           | 1    | 2        |
| `services_down_count`               | 0    | 1        |
| `top_service_alert_volume_sigma`    | 2    | 3        |
| `throughput_change_pct_wow`         | -15  | -30      |

## Data sources

* `GET https://api.{region}opsgenie.com/v2/account` - Auth + key sanity + region check
* `GET https://api.{region}opsgenie.com/v2/alerts` - Alert inventory + acknowledgement + routing coverage + MTTA
* `GET https://api.{region}opsgenie.com/v2/alerts/count` - Alert-volume counts for top-N + noise / baseline checks
* `GET https://api.{region}opsgenie.com/v1/incidents` - Incident inventory + impactedServices + MTTR (revenue-at-risk join)
* `GET https://api.{region}opsgenie.com/v2/services` - Service inventory + health state + open alert/incident counts
