What this audit checks
Performance
- Apdex score below 0.9
- p95 response time
- Throughput trend
Errors & availability
- Error rate trend
- Top error classes
- Recent deployment correlated with error spike
Alert coverage
- Alert policies exist
- Critical conditions configured
- Notification channels set
Service health
- Services down (>0 is critical)
- Services degraded above tolerance
- SLA compliance below target
Incident response
- Open incidents linger unresolved
- Mean time to acknowledge above band
- Mean time to resolve above band
Severity thresholds
| Signal | Warn | Critical |
|---|---|---|
sla_compliance | 99.9 | - |
Data sources
POST https://api.newrelic.com/graphql- NerdGraph GraphQL queries for entities + metrics