p95 Response Time, New Relic

Metrics type: Key Metrics • Category: Monitoring

At a glance

The 95th-percentile response time across server-side Transaction events. The “tail latency” number that captures the experience of the worst-served 5% of customers. Average latency lies; p95 tells you whether your slowest checkouts are still acceptable.


What it counts	`percentile(duration, 95) FROM Transaction WHERE appName IN (...)` in milliseconds. The duration includes server-side processing only, no DNS, no network transit, no browser parse. For end-to-end latency see RUM Page Load p95.
NerdGraph endpoint	NRQL via NerdGraph: `SELECT percentile(duration, 95) FROM Transaction WHERE appName IN (...) SINCE 5 MINUTES AGO FACET host`. The `FACET host` returns per-host p95 so you can spot a single bad node.
Metric basis	True percentile from the full event distribution, computed via t-digest on sampled accounts. Sample-stable: even at 50% sampling, p95 estimate is within ~5ms of true value on most distributions.
Aggregation window	5-minute rolling for the live KPI; 1-hour rollup on the trend chart; 1-day rollup for vsP comparison.
Browser vs APM scope	APM-only (server-side duration). Browser RUM equivalents are in the Real User Monitoring section. APM p95 of 220ms with Browser p95 of 4.2s means most of customer-perceived latency is client-side (heavy JS, slow CDN, third-party scripts).
Filtered hosts / services	`appName IN (...)` per-merchant scope; background workers excluded by default. To narrow to checkout: add `WHERE name LIKE 'Controller/Checkout/%'`.
Sample basis	t-digest percentile is sample-stable. Heavy-tail edge cases (a single 30s outlier in a 50% sample) can shift the estimate slightly; over a 5-minute window with 5,000+ transactions the effect is sub-millisecond.
Severity threshold	All transactions equally weighted by default. Checkout-only scoping recommended for revenue-protective alerting.
Time zone	Account timezone for chart axes; UTC for raw event timestamps.
Time window	`T/7D vsP` (today vs prior 7-day average for the same time-of-day)
Alert trigger	`>1500ms`, calibrated to “above the SOASTA-derived 1.5s perception threshold” beyond which conversion drops measurably on commerce flows. Tune to 800ms for premium / luxury brands, 2500ms for B2B integrations.
Sentiment key	`avg_response`
Roles	owner, engineering

Calculation

Calculated automatically from your New Relic data. See the At a glance summary above for what the metric tracks and the worked example below for a typical reading.

Worked example

A Shopify Plus storefront on a Node.js BFF (backend-for-frontend) layer. The Apdex t is configured to 0.5s for browse pages, 1.0s for checkout. The 5-minute window covers 09:00, 09:05 on 02 May 26.

Metric	Value	Reading
Mean response time	380ms	Looks fast
p50 (median)	290ms	Most users get a snappy response
p95	1,840ms	Tail problem
p99	3,400ms	Worst 1% are very slow
Apdex (`t = 0.5`)	0.74	Below the 0.85 healthy threshold

The mean of 380ms looks fine, but p95 reveals that 5% of customers (about 600 of every 12,000 in a 5-minute window) are waiting nearly 2 seconds for a server response, before any browser parsing or network transit. With Apdex t = 0.5, the frustrated threshold is 4 x 0.5 = 2.0s, so this group is sitting right on the frustrated boundary. Apdex of 0.74 confirms that a meaningful chunk has already crossed it. Conversion impact. Akamai 2017 and Deloitte 2020 data converge on roughly 7% conversion drop per 100ms of additional p95 latency on commerce checkouts. From a healthy 1,000ms baseline to 1,840ms is +840ms, mapping to a ~58% conversion drop on the affected slice. On a £2M/month checkout flow with 5% of visitors in the slow-tail group, the exposure is roughly £2M x 5% x 58% = £58k/month of risked conversion. The exposure ends the moment p95 returns to baseline. Apdex calibration interaction. Apdex with t = 0.5 rewards fast servers heavily. Below 0.5 means most users are frustrated; the formula puts you there if more than half your traffic is over 2s. So Apdex going from 0.86 to 0.5 typically corresponds to p95 going from 1.0s to roughly 2.5, 3.0s on a real distribution. If next 5-minute window’s p95 climbs to 2,400ms (still common on a hot evening or during a deploy), the conversion impact roughly doubles, the frustrated bucket expands from 5% to 12, 15% of traffic, and Apdex drops below 0.6. At that point this card alone justifies pulling the deploy or rolling back.

Sibling cards merchants should reference together

Card	Why pair it with p95 Response Time
Avg Response Time	Companion mean view. Mean stable + p95 rising = a tail problem (a slow dependency, a single bad host). Mean + p95 both rising = system-wide slowdown.
p99 Response Time	The worst-1% view. p99 rising while p95 holds means a small group is hitting an edge case (cold caches, a saturated worker pool).
Database Query Latency p95	The most common cause of server p95 going up. Pair to confirm whether the slowdown is application-level or DB-level.
Apdex Score	Satisfaction-weighted view of the same distribution. p95 tells you “how slow”; Apdex tells you “how many customers feel it”.
Slowest Transactions	The drill-down. Tells you which endpoint is producing the slow-tail requests.
Datadog p95 Latency	Cross-connector peer. Should agree within 5, 15ms; persistent gaps indicate probe-coverage drift.
GA4 LCP (Web Vitals)	Customer-side counterpart. NR p95 = server only; GA4 LCP includes network + browser. The gap = client-side overhead.
Shopify Sales / Min	Revenue-side outcome. Watch sales/min co-move with p95 to quantify conversion impact during latency events.

Reconciling against the vendor’s own dashboard

Where to look in New Relic:

APM > Application > Summary shows the response-time chart with selectable percentiles.
Dashboards > “Latency analysis” pre-built.
Service Levels if a latency SLO is configured.
Alerts & AI > Conditions for the alert backing this card.

Why our number may legitimately differ from New Relic’s own screens:

Reason	Direction of divergence
Account timezone vs UTC. NR APM follows the account timezone; NRQL via Vortex IQ runs in UTC. p95 differences across midnight boundaries can show 5, 50ms drift.	Either direction at boundaries
NRQL retention windows. Full-resolution data is 8 days standard, 13 months on Data Plus. Past the retention window, p95 is computed on hourly aggregates with slightly less precision.	Older data ~5, 20ms coarser
Ingest sampling. NR samples `Transaction` events on high-cardinality accounts. t-digest percentile is sample-stable but heavy-tail edge cases can shift estimates ~5ms.	Either direction, sub-millisecond on high-volume windows
NerdGraph rate limits. Default 3,000 NRQL points / minute / account. Stale-by-30s during heavy investigation.	Stale, not wrong
Filter scope. Vortex IQ scopes `appName` to merchant-relevant apps; NR’s all-apps default chart includes background workers which often have very different latency profiles.	Vortex IQ < NR raw on most stores

Cross-connector reconciliation: NR APM agent and Datadog APM trace agent instrument at slightly different points in the request lifecycle. NR typically measures from the framework middleware entry to response finish; Datadog often includes a few hundred microseconds of trace-collection overhead. Steady-state p95 numbers should agree within 5, 15ms. A 30ms+ persistent gap is itself diagnostic, not noise. The most common causes: (a) one platform missing the framework’s pre-routing middleware (auth, CORS); (b) one agent’s sampling cutoff different (NR samples >100ms requests at higher rate by default); (c) different appName / service filter scope. NR Browser RUM and GA4 Web Vitals also disagree on Browser p95 latency: Browser RUM instruments every page load, GA4 samples by session. On stores with low session rates (B2B), the GA4 number can be 15, 25% noisier than RUM.

Known limitations / merchant FAQs

NR vs Datadog: which one’s p95 should I trust? Both, on different scopes. If the two agree within 15ms, neither is wrong, both are valid samples of the same underlying distribution. If the gap is larger and persistent, audit instrumentation coverage on each platform; the discrepancy almost always traces to one platform missing a route or a middleware. We’ve seen NR run 8ms higher than DD on Express.js apps (because NR includes the route-matching middleware DD skips), and 12ms lower on Spring Boot (because DD’s interceptor sits earlier in the chain). Apdex math: how does p95 relate to Apdex? They measure overlapping but distinct things. Apdex with t = 0.5s puts everything above 2s (4 x t) in the frustrated bucket. So p95 going from 1s to 2.5s typically drops Apdex from ~0.85 to ~0.55 because the frustrated bucket roughly doubles. p95 is the raw number; Apdex is the satisfaction-weighted reading. Both should be on your wall. NRQL retention: can I see p95 over 90 days? Yes, but on aggregated data past day 8 (standard plan) or day 395 (Data Plus). The p95 number remains valid; what you lose is the ability to drill into a specific 5-minute window past the retention boundary. For 90-day trend analysis use the rolled aggregates; for incident forensics stay within the full-resolution window. Why does my p95 disagree with Datadog by 12ms? Probe-overhead difference. Both are right; the gap is a known constant for your stack and as long as it stays constant, trends match across platforms. If a deploy lands and NR shows +50ms while DD shows the same +50ms, the regression is real on both. The 12ms baseline gap is not a problem; a sudden change in the gap (NR +50, DD +5) would be. Sampling impact on p95: am I missing slow requests? No. NR’s t-digest percentile algorithm is designed to be sample-stable. Even at 25% sampling, p95 estimates are within ~10ms of true on most production distributions (Akamai 2018 study). What sampling does affect is the count of slow requests, not the percentile threshold. If you need exact slow-request counts (for SLA enforcement), use raw event count alongside the percentile. Multi-account: my US and EU accounts have different p95 baselines, how do I monitor both? Connect each NR account as a separate Vortex IQ integration and stack the cards in the Nerve Centre. Stack panels handle weighted aggregation correctly, the combined p95 is computed by merging t-digest sketches, not by averaging the two p95s (which would be mathematically wrong). Ingest cost vs visibility: can I save money without losing p95 accuracy? Yes. Drop sample rate on non-checkout transactions to 25%, keep checkout at 100%, keep all error events at 100%. The p95 estimate stays sample-stable, error rate stays accurate, and ingest cost typically drops 40, 60%. This is the standard Vortex IQ recommended config. Alert tuning: my p95 alert flutters at 1500ms during peak traffic, what should I tune? Three options in order: (a) raise the threshold to 1750ms if your peak baseline is reliably 1400, 1500ms; (b) add a 5-minute “must stay above 1500ms” duration clause; (c) split the alert into two: a peak-traffic alert (12:00, 21:00) at 1750ms and an off-peak alert at 1200ms. Option (c) is most accurate for stores with strong diurnal traffic patterns. My p99 is 8s but p95 is 800ms, is that OK? A 10x gap between p95 and p99 means you have a long tail. Usually one of: a slow background job competing for shared resources, a cold-cache penalty on the first request to a region, or a single overloaded host. Open p99 Latency faceted by host to see if the slow tail is concentrated.

Tracked live in Vortex IQ Nerve Centre

p95 Response Time is one of hundreds of KPI pulses Vortex IQ tracks across New Relic and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.

Get Started

The AI OS

At a glance

Calculation

Worked example

Sibling cards merchants should reference together

Reconciling against the vendor’s own dashboard

Known limitations / merchant FAQs

Tracked live in Vortex IQ Nerve Centre

​At a glance

​Calculation

​Worked example

​Sibling cards merchants should reference together

​Reconciling against the vendor’s own dashboard

​Known limitations / merchant FAQs

​Tracked live in Vortex IQ Nerve Centre

At a glance

Calculation

Worked example

Sibling cards merchants should reference together

Reconciling against the vendor’s own dashboard

Known limitations / merchant FAQs

Tracked live in Vortex IQ Nerve Centre