p95 Response Time, Datadog

Metrics type: Key Metrics • Category: Monitoring

At a glance

The 95th-percentile response time across all instrumented services: 19 of every 20 shopper requests are at least this fast, but the slowest 1-in-20 may be slower. For a merchant, this is “for the unlucky 5% of shoppers, how long do they wait?” Above 1.5 seconds the page feels broken; above 3 seconds shoppers start abandoning.


API endpoint	Datadog Metrics API, `GET /api/v1/query` with `p95:trace.servlet.request{*}` (or the runtime equivalent: `trace.aspnet_core.request`, `trace.express.request`, `trace.flask.request`, etc).
Metric basis	APM span duration percentile, NOT individual request timing. Datadog computes p95 from histograms sampled at the agent.
Aggregation window	1-minute rollup at source; the card displays the rolling 5-minute p95 against the 7-day comparison window.
Severity threshold	P1 = p95 above 3,000 ms (revenue impact likely); P2 = p95 above 1,500 ms (alert trigger); P3 = p95 above 800 ms (warning).
Alert pre-filtering	Synthetic test traffic (`@user_agent:Datadog/Synthetic`) and health-check endpoints (`/health`, `/ping`, `/metrics`) are excluded by default. Without this, your synthetic test cadence dominates the percentile distribution.
Log Management gating	Not used. Latency is APM-derived; the card returns valid values regardless of whether Logs is enabled.
Why p95 and not average	Average latency hides the long tail. A p50 of 200 ms looks fine even when the slowest 5% is 8 seconds. p95 is the merchant-meaningful number because it captures the experience of the unlucky shoppers most likely to bounce.
Why p95 and not p99	p99 is too noisy at typical merchant traffic levels; a single GC pause or cold start skews the number. p99 is appropriate for sites above 100,000 req/min. See p99 Response Time.
Filtered hosts / services	All instrumented services in aggregate. For per-service breakdown see Top Slow Endpoints.
Time zone	Account timezone for chart axes; UTC for cross-connector windowing.
Time window	`T/7D vsP` (today vs prior 7-day average)
Alert trigger	`> 1500ms`, p95 above 1.5 seconds sustained for 5 minutes pages on-call.
Sentiment key	`avg_response`
Roles	owner, engineering

Calculation

Calculated automatically from your Datadog data. See the At a glance summary above for what the metric tracks and the worked example below for a typical reading.

Worked example

A UK supplements brand on Shopify with Datadog APM on the storefront, checkout, and search services. The team upgraded a third-party recommendations widget on 22 Apr 26 and missed that it added a 600 ms server-side fetch to every product-detail page.

Hour (UTC)	p50	p95	p99	Throughput
09:00 (before deploy)	180 ms	800 ms	1,420 ms	4,100 req/min
11:00 (deploy at 10:50)	210 ms	1,200 ms	2,100 ms	4,250 req/min
13:00	240 ms	1,650 ms	3,200 ms	3,890 req/min
15:00	260 ms	1,890 ms	3,800 ms	3,520 req/min
16:00 (rolled back)	195 ms	850 ms	1,500 ms	4,180 req/min

Two stories the table tells. First, p50 barely moved (180 ms to 260 ms is invisible to most shoppers); the regression hit the tail, which is exactly what p95 is designed to surface. Second, throughput dropped from 4,100 to 3,520, a 14% decline. That is shoppers abandoning slow pages and not coming back. Same period: Shopify orders/min dropped from 32 to 24. Conversion rate dropped from 1.86% to 1.43%, a 23% relative drop.

Revenue impact (estimated):
  - Slow window: 11:00 to 16:00, 5 hours
  - Baseline orders/min: 32
  - Observed orders/min during slowdown: 26 (average across the window)
  - Lost orders ≈ (32 − 26) × 60 × 5 = 1,800 orders
  - At AOV £58: lost revenue ≈ £104,400
  - Cost of the recommendations widget: ~£300/month
  - Net impact: catastrophic

Three takeaways merchants should remember:

p95 is the first metric to read when conversion drops without a clear cause. Average latency, error rate, and uptime all looked fine in this incident; only p95 told the story. The slowest 5% of shoppers are the ones most likely to bounce, and they are invisible to average-based metrics.
600 ms is the magic number. Industry research (Akamai, Google) consistently finds that every 100 ms of added p95 costs roughly 1% of conversion at typical ecommerce sites. This brand added 600 ms and lost ~6% of conversion in absolute terms (which is 23% in relative terms because their baseline conversion was 1.86%).
Third-party fetches are the most common p95 regression cause. Recommendations widgets, A/B test SDKs, chat widgets, review widgets, paid-social pixels, fraud-detection scripts. They each add a small amount of latency individually; collectively they dominate the tail. Run Top Slow Endpoints once a quarter and audit anything in the top 10 that is third-party.

Sibling cards merchants should reference together

Card	Why pair it with p95 Latency	What the combination tells you
p99 Response Time	The deeper-tail view. Above 100k req/min, p99 is the right percentile; below, p95 is more stable.	Both moving together equals general slowdown; only p99 moving equals isolated bad-luck requests.
Apdex Score	Apdex amalgamates p95 and error rate into a single shopper-perception number.	If Apdex is fine but p95 is high, your tolerance threshold may be too generous; tighten it.
Top Slow Endpoints	The breakdown card. When p95 jumps, this card tells you which endpoint is responsible.	One endpoint dominating equals a code-path regression; many endpoints equals an infrastructure regression.
Database Query Latency p95	The most common cause of application p95 regression.	App p95 up + DB p95 up equals slow query or pool exhaustion; app up + DB flat equals upstream/CPU/lock contention.
Throughput (req/s)	Latency and throughput trade off.	Latency up + throughput down equals capacity-limited; latency up + throughput flat equals slowness without queue building.
Deploy Markers vs Latency	Overlay deploy events on the latency chart.	Almost every p95 regression aligns with a recent deploy.
Page Load p95	Browser-side peer. APM measures server time; RUM measures end-to-end shopper time.	RUM p95 minus APM p95 equals network + browser render time; widening gap equals CDN, third-party, or asset bloat.
Shopify / BC / Adobe Total Revenue	The merchant-impact card.	Sustained p95 above 1,500 ms typically corresponds to a 5-15% revenue dip within an hour.

Reconciling against the vendor’s own dashboard

Where to look in Datadog:

APM → Service List for per-service latency percentiles. APM → Traces filtered by @duration:>1500ms for individual slow request examples. Dashboards → APM Overview for the time-series of p50, p95, p99. APM → Service Map to see latency by upstream/downstream dependency.

Why our number may legitimately differ from Datadog’s UI:

Reason	Direction	Why
Time zone	Boundary days off	Datadog UI displays in account timezone; Vortex IQ uses UTC for cross-connector windowing.
API rate limits	Brief gaps	The Metrics query API is rate-limited; on burst minutes a polled value may use the cached prior result.
Log indexing latency	Not applicable	Latency is APM-derived, not log-derived.
Monitor state cache	Up to 60 seconds	Monitor state refreshes once per minute.
Span sampling	Both directions	If your APM uses head-based sampling at <100%, the p95 is computed from the sample which may slightly differ from the population. Tail-based sampling that prefers slow spans inflates Datadog’s p95 vs raw.

Cross-connector reconciliation:

Card	Expected relationship	What causes the divergence
`google_analytics.ga_property_health` and RUM peers	Datadog APM measures server-side timing; RUM and GA4 measure browser-side end-to-end timing. RUM client-side vs APM server-side discrepancy is healthy and expected.	RUM p95 minus APM p95 equals network + browser render + third-party scripts. A 1.5x-2x ratio (RUM higher) is typical and healthy.
`shopify.total_revenue` / `bigcommerce.total_revenue` / `adobe_commerce.total_revenue`	Inverse: when p95 sustains above 1,500 ms, conversion typically drops 5-15% within the hour.	If p95 is up but revenue is steady, the slowness is on a non-revenue path (admin, workers, internal API).
Datadog logs	Subset relationship: log indexing latency is a different metric (data freshness) from app latency.	Do not conflate the two.

Known limitations / merchant FAQs

My average response time is 250 ms. Why is the dashboard amber? Because p95 is amber, not the average. Average hides the long tail. The 5% of shoppers in the slowest bucket are typically the ones most likely to abandon the cart. A p95 of 1,800 ms with a p50 of 250 ms is a real problem even though the “average shopper experience” looks fine. What is a healthy p95 for an ecommerce site? Below 800 ms is excellent. 800-1,200 ms is good. 1,200-1,500 ms is the warning zone. Above 1,500 ms is the alert zone (page abandonment becomes measurable). Above 3,000 ms shoppers abandon at high rates and the site feels broken. These thresholds are calibrated against industry research; your specific brand may tolerate slightly more (luxury) or less (high-velocity discount) before conversion impact. Datadog says p95 is fine but customers are complaining the site is slow. The classic Datadog blind spot. APM measures server-side timing, NOT browser-side experience. Three places to check: (1) Page Load p95, the RUM equivalent that includes network and browser render time; (2) Browser Test Latency p95, a synthetic browser bot that runs the full page load including third-party scripts; (3) Look at your CDN cache hit rate, a degraded CDN will not affect APM but will tank shopper experience. Also check whether a third-party script (chat widget, A/B SDK, fraud check) added a blocking JS execution that delays interactivity. Why does my p95 spike at 03:00 UTC every day? Two common causes: (1) Your nightly batch jobs (sitemap regeneration, search-index rebuild, fraud-pattern training) run at low-traffic hours and the few requests that do arrive during the batch wait behind the batch’s DB locks; (2) Your APM agent’s metric flush may coincide with another scheduled task. Check whether the spike is driven by 1-2 endpoints (which is the batch case) or all endpoints (which is the agent case). Should I optimise for p95 or p99? For most merchants, p95. p99 is too noisy at typical traffic levels (under 100,000 req/min) because a single GC pause moves it 30-50%. p95 is stable enough to alert on and meaningful enough that improvements correspond to measurable shopper-experience gains. Optimise p99 only if you are at high traffic AND p95 is already excellent (<400 ms). For most stores, p99 is decoration. My Logs API returns 400 No valid indexes. Does this card still work? Yes. p95 latency is APM-derived, not log-derived. Log Management gating only affects log-volume cards. Vortex IQ logs the gating event once at INFO level and skips log-only cards. Why does my p95 differ between weekdays and weekends? Most ecommerce sites see lower p95 on weekends because traffic is lower and queues do not build. If your weekend p95 is higher, look for: (1) reduced ops staff causing slower issue resolution, (2) batch jobs scheduled for weekends when traffic is light but still affecting users, (3) infrastructure auto-scaling that scales down too aggressively for weekend traffic patterns. Datadog measures p95 per-service; how is the headline computed? Datadog’s API supports p95:trace.servlet.request{*} which computes the percentile across all instrumented services together. This is the right number for a merchant headline because it captures the experience across whatever endpoint a shopper happened to hit. For per-service breakdown, use Top Slow Endpoints. Note: Datadog computes per-service percentiles independently then merges; the headline is NOT a weighted average of service p95s, it is a true cross-service percentile. My multi-region site has different p95 in different regions. Which one is shown? The headline is computed across all regions. For per-region breakdown, use Uptime by Region for synthetic-test results, or filter the Datadog query by @datacenter: tag in the Datadog UI. Vortex IQ does not currently expose per-region APM percentiles in the headline (planned for a future release).

Tracked live in Vortex IQ Nerve Centre

p95 Response Time is one of hundreds of KPI pulses Vortex IQ tracks across Datadog and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.

Get Started

The AI OS

At a glance

Calculation

Worked example

Sibling cards merchants should reference together

Reconciling against the vendor’s own dashboard

Known limitations / merchant FAQs

Tracked live in Vortex IQ Nerve Centre

​At a glance

​Calculation

​Worked example

​Sibling cards merchants should reference together

​Reconciling against the vendor’s own dashboard

​Known limitations / merchant FAQs

​Tracked live in Vortex IQ Nerve Centre

At a glance

Calculation

Worked example

Sibling cards merchants should reference together

Reconciling against the vendor’s own dashboard

Known limitations / merchant FAQs

Tracked live in Vortex IQ Nerve Centre