Opsgenie audit profile, Vortex IQ - Vortex IQ Help Centre

Nerve Centre KPIs · Audit Profile · Sentiment Settings Opsgenie alert and incident state means little to a merchant unless it’s joined to the revenue those services protect. This audit answers four questions: (1) is the API key still good and the alerts / incidents / services readable, (2) is the on-call process actually covering the alerts that fire (un-acknowledged alerts, no-routing gaps, noisy services), (3) are we acknowledging and resolving fast enough to hold SLA (MTTA / MTTR / SLA compliance), and (4) when a service IS on fire, how much money is on fire per minute when it fronts a commerce-critical path?

What this audit checks

Authentication & access

API key valid (auth on /v2/account) and not revoked
Region host correct (US = api.opsgenie.com / EU = api.eu.opsgenie.com)
Key has read scope on Alerts, Incidents, and Services
Request-quota headroom > 15% (429 / Retry-After avoidance)

Open alerts un-acknowledged > 30 min (no on-call pickup)
Alerts with no responder / routing rule match (fires into the void)
P1 / P2 alerts un-acknowledged at all (highest-severity coverage gap)
Services with sustained alert volume but no declared incident (noise drowning signal)
Alerts auto-closed without acknowledgement (silent dismissals)

Response speed & SLA health

MTTA above 5 min sustained (acknowledgement lag = routing / coverage problem)
MTTR above 60 min sustained (resolution lag = capacity problem)
SLA compliance below 99.5% (reliability commitment slipping)
Incidents open > 0 with no update in last 30 min (stalled response)
Apdex below 0.85 / error rate > 2% / p95 > 1500ms on a tracked service

Alert economics & noise

Top alerting service alert volume > 2σ vs its 30-day baseline (noise spike)
Recurring error-type cluster trending up (fix-at-source candidate)
Flapping alerts (open -> close -> open > 3 times in 24h on same alias)
Throughput on a tracked service dropped > 30% WoW (capacity / outage signal)

Cross-channel: revenue-at-risk (the killer area)

Open incident whose impactedServices intersect a commerce sibling’s checkout / payment service = compute $/min lost (commerce.revenue_per_min × incident_minutes × estimated_traffic_loss_pct)
Alert storm (> 10 alerts/h) on a service that fronts checkout / payments / search during peak hours
Alert spike on a commerce-critical service during a campaign push (sibling = google_ads / amazon_ads / klaviyo) - paying for traffic that can’t convert
MTTR degradation on commerce-critical services correlated with a sibling commerce conversion / abandonment regression

Severity thresholds

Signal	Warn	Critical
`alerts_unacknowledged_30min_count`	1	5
`p1_p2_unacknowledged_count`	0	1
`mtta_seconds`	300	600
`mttr_seconds`	3600	7200
`sla_compliance_pct`	99.9	99.5
`incidents_open_count`	1	3
`services_degraded_count`	1	2
`services_down_count`	0	1
`top_service_alert_volume_sigma`	2	3
`throughput_change_pct_wow`	-15	-30

Data sources

GET https://api.{region}opsgenie.com/v2/account - Auth + key sanity + region check
GET https://api.{region}opsgenie.com/v2/alerts - Alert inventory + acknowledgement + routing coverage + MTTA
GET https://api.{region}opsgenie.com/v2/alerts/count - Alert-volume counts for top-N + noise / baseline checks
GET https://api.{region}opsgenie.com/v1/incidents - Incident inventory + impactedServices + MTTR (revenue-at-risk join)
GET https://api.{region}opsgenie.com/v2/services - Service inventory + health state + open alert/incident counts

​What this audit checks

​Authentication & access

​Alert coverage (the blind-spot test)

​Response speed & SLA health

​Alert economics & noise

​Cross-channel: revenue-at-risk (the killer area)

​Severity thresholds

​Data sources