> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vortexiq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Splunk audit profile, Vortex IQ

> What the Vortex IQ Splunk health audit checks: Splunk: Coverage, Reliability, Incident Response & Revenue-at-Risk

**[Nerve Centre KPIs](/nerve-centre/kpi-cards/splunk) · [Audit Profile](/nerve-centre/kpi-cards/splunk/audit) · [Sentiment Settings](/nerve-centre/kpi-cards/splunk/sentiment)**

Splunk state means nothing to a merchant unless it's joined to revenue. This audit answers four questions: (1) is the merchant's stack healthy right now (services down / degraded, alerts firing, latency + error-rate in band), (2) is the on-call rotation responding fast enough (MTTA / MTTR, open-incident backlog), (3) are we hitting SLA, and (4) when a commerce- path service IS down or alerting, how much money is on fire per minute?

## What this audit checks

### Authentication & access

* Observability API token valid (auth on /v2/organization)
* Realm host correct for region (us0 / us1 / us2 / eu0 / eu1 / jp0 / au0)
* Splunk On-Call API ID + key present when incidents & alerts cards are enabled
* Token scope covers APM services, detectors, and SLO read

### Performance & reliability

* Apdex below 0.85 sustained
* p95 latency above 1500ms sustained
* p99 latency above 3000ms sustained
* Error rate above 2% sustained
* Throughput dropped > 30% WoW (capacity / outage signal)
* SLA compliance below 99.5% over the reporting window

### Service health & coverage (the blind-spot test)

* Any service in DOWN state (active outage)
* More than 2 services in DEGRADED state
* Commerce-path services (checkout / cart / catalogue / search) without an active detector
* Detectors in paused/draft state on commerce-path services (fires silently)
* Alerting concentration on a single service (top-N alerting service == commerce path)

### Incident response (on-call economics)

* Open incident backlog above 3 (rotation falling behind)
* MTTA above 30 minutes (slow first response)
* MTTR above 60 minutes (slow resolution)
* Incidents acknowledged but unresolved > 24h (stuck triage)
* Repeat incidents on the same service within 7 days (unfixed root cause)

### Cross-channel: revenue-at-risk (the killer area)

* Commerce-path service DOWN/DEGRADED with sibling commerce connector live = compute \$/min lost (commerce.revenue\_per\_min × down\_minutes × estimated\_traffic\_loss\_pct)
* Alerts firing on the checkout service during peak commerce hours
* Top alerting service maps to a commerce-path service (cart / checkout / catalogue / search)
* Error-rate spike on a commerce service during a campaign push (sibling = google\_ads / amazon\_ads / klaviyo) - paying for traffic that can't convert
* SLA breach window overlapping a commerce traffic peak (lost-order estimate)

## Severity thresholds

| Signal                      | Warn | Critical |
| --------------------------- | ---- | -------- |
| `apdex`                     | 0.9  | 0.85     |
| `error_rate_pct`            | 1    | 2        |
| `p95_latency_ms`            | 1000 | 1500     |
| `p99_latency_ms`            | 2000 | 3000     |
| `avg_response_ms`           | 500  | 1000     |
| `throughput_change_pct_wow` | -15  | -30      |
| `sla_compliance_pct`        | 99.9 | 99.5     |
| `services_degraded_count`   | 1    | 2        |
| `services_down_count`       | 0    | 1        |
| `incidents_open_count`      | 2    | 3        |
| `mtta_minutes`              | 15   | 30       |
| `mttr_minutes`              | 30   | 60       |

## Data sources

* `GET https://api.{realm}.signalfx.com/v2/organization` - Auth + token sanity
* `GET https://api.{realm}.signalfx.com/v2/apm/services` - APM service inventory + health states
* `POST https://api.{realm}.signalfx.com/v2/signalflow` - Run metric programs for latency / error-rate / throughput / apdex checks
* `GET https://api.{realm}.signalfx.com/v2/detector` - Detector inventory + firing/active state + notification coverage
* `GET https://api.{realm}.signalfx.com/v2/slo` - SLO state + SLA compliance
* `GET https://api.{realm}.signalfx.com/api-public/v1/incidents` - On-Call incident inventory (MTTA / MTTR / backlog + revenue-at-risk join)
* `GET https://api.{realm}.signalfx.com/api-public/v1/reporting/v2/metrics` - Incident reporting metrics for MTTA / MTTR aggregates
