At a glance
Search Latency p95 (ms) is the time below which 95% of search queries complete: only the slowest 5% take longer. This is the storefront-facing number that matters most to real users. The median (p50) can look healthy while p95 is quietly miserable, and it is the p95 experience that shows up as a sluggish search box on a category page. For a storefront backed by Elasticsearch, p95 is directly user-impacting, which is why it carries a hard 200ms alert.
| What it tracks | The 95th-percentile query service time across all search shards for the selected period. The slowest 5% of queries take longer than this value; the other 95% are faster. |
| Data source | Reconstructed from indices.search.query_time_in_millis divided by query_total delta, read from the Elasticsearch node stats API (GET /_nodes/stats/indices/search). Vortex IQ builds a percentile distribution across the window rather than a flat counter average. |
| Time window | RT/5m (real-time, rolling 5-minute window, refreshed continuously). |
| Alert trigger | > 200ms. A sustained p95 above 200ms means the slow tail is wide enough that a noticeable share of shoppers are waiting on search. |
| Why it matters | Search is the highest-intent path on a storefront. A slow p95 suppresses conversion long before it ever turns into an error or an outage; it is the silent revenue leak. |
| What counts | Query-phase service time on the data nodes for search and _search-type requests. |
| What does NOT count | Network round-trip from the browser, application-tier serialisation, fetch-phase of very large _source payloads measured separately, and aggregations-only requests if scoped out in the connector. |
| Roles | engineering, operations, owner |
Calculation
Elasticsearch exposes two monotonic counters per node in the search index stats:query_total (the number of query-phase operations completed) and query_time_in_millis (the cumulative milliseconds spent in the query phase). On their own these give only a lifetime average. Vortex IQ samples the counters on each poll, takes the delta between consecutive samples, and aggregates the per-shard service times into a distribution across the rolling 5-minute window. The 95th percentile is read from that distribution and reported as the card value in milliseconds.
Because the figure is a percentile and not a counter ratio, it is robust to a few very slow queries skewing a window average, and it captures the experience of the unlucky tail. The value is cluster-wide by default: it blends every searchable index. Where a connector is scoped to a specific index pattern (for example the product catalogue index that powers storefront search), the distribution is built only from those shards, which is the reading most ops teams want because it isolates the customer-facing path from background analytics queries.
Worked example
A platform team runs a 6-node Elasticsearch cluster behind a high-traffic storefront. The product-search index has 3 primary shards and 1 replica each. Snapshot taken on 14 Apr 26 at 19:40 BST during the evening traffic peak.| Percentile | Reading | Window |
|---|---|---|
| p50 | 38ms | RT/5m |
| p95 | 264ms | RT/5m |
| p99 | 910ms | RT/5m |
- The breach is in the tail, not the bulk. p50 at 38ms against p95 at 264ms is a 7x spread. That pattern points at a subset of expensive queries, not at an undersized cluster. Likely culprits: deep pagination (
from+sizereaching into the thousands), unbounded wildcard or leading-wildcard terms, or a heavy aggregation riding on the same index. - It co-occurs with the peak. The breach started at 19:25 as traffic climbed. Pairing with Search Queries per Second (live) shows QPS up 40% over the afternoon baseline, so the tail is partly load-driven. The search thread pool is queueing.
- Heap is warm but not critical. JVM Heap Used % reads 71%, just below the 75% GC-pressure line. Garbage-collection pauses are starting to nibble at the tail.
search_after. The takeaway: p95 is the number that tells you customers are feeling it, well before any error card lights up.
Sibling cards
| Card | Why pair it with Search Latency p95 | What the combination tells you |
|---|---|---|
| Search Latency p50 (ms) | The median, the bulk-of-traffic baseline. | A wide p50-to-p95 gap means a slow tail (expensive queries); a narrow gap that rises together means a broadly overloaded cluster. |
| Search Latency p99 (ms) | The extreme tail, the worst 1%. | p99 spiking while p95 holds means a handful of pathological queries; both rising means systemic pressure. |
| Search Queries per Second (live) | The load driving the latency. | p95 up with QPS up equals capacity; p95 up with flat QPS equals a query-shape or heap problem. |
| Slow-Query Rate % | The share of searches breaching the slowlog threshold. | Confirms whether the tail is a few outliers or a growing fraction of all traffic. |
| Top 10 Slow Searches | The actual query shapes behind the tail. | Names the offending queries so you can fix the cause, not the symptom. |
| JVM Heap Used % | Heap pressure drives GC pauses that inflate the tail. | Heap above 75% with a rising p95 means GC pauses are the cause. |
| Search Error Rate % | The failure peer to the latency reading. | Latency high then errors high means the search pool is saturating into rejections. |
| HTTP Connection Saturation % | Connection-tier headroom under load. | Saturation high with rising p95 means clients are queueing before queries even start. |
Reconciling against the source
Where to look in Elasticsearch’s own tooling:Why our number may legitimately differ:GET /_nodes/stats/indices/searchfor the rawquery_totalandquery_time_in_milliscounters per node; the lifetime ratio isquery_time_in_millis / query_total.GET /<index>/_stats/searchfor the same counters scoped to a single index pattern. Kibana Stack Monitoring → Overview → Search for the latency chart over time, and the search-slowlog (configured viaindex.search.slowlog.threshold.query.warn) for the actual slow queries. On Elastic Cloud or AWS OpenSearch Service, the search-latency series appears in the cluster’s monitoring dashboard.
| Reason | Direction | Why |
|---|---|---|
| Percentile vs counter average | Either | The node stats ratio is a window average; Vortex IQ reports a true 95th percentile across the window, which is usually higher than the average. |
| Window length | Either | Vortex IQ uses a rolling 5-minute window; a Kibana chart bucketed at 1-minute or 1-hour will smooth differently. |
| Index scope | Usually lower | A connector scoped to the storefront index excludes background analytics queries that inflate a cluster-wide average. |
| Phase boundary | Usually lower | This card measures the query phase only; end-to-end request time also includes the fetch phase and coordinating-node overhead. |
| Time zone | Axis shift only | Chart axes render in the merchant’s display time zone; Elasticsearch stores UTC. |
| Card | Expected relationship | What causes divergence |
|---|---|---|
| Slow Searches During Checkout Window (5m) | A p95 breach should correlate with slow searches landing in checkout windows. | Breach with no checkout-window slow searches means the tail is on non-purchase paths (admin, analytics). |
| ES Search Pool Saturation vs Ecom Burst | p95 rises as the search pool saturates during an ecom traffic burst. | p95 up with low pool saturation means a query-shape problem, not capacity. |
Known limitations / FAQs
My users complain search is slow but p95 reads 90ms. Why? p95 measures only the query phase service time on the data nodes. The user’s experience also includes browser-to-app network latency, application-tier query construction, the fetch phase for large result payloads, and any front-end rendering. If p95 is healthy but users are not, look upstream of Elasticsearch, or check Search Latency p99 (ms) in case the specific users hitting trouble are in the worst 1%. Why 200ms as the threshold and not something lower? 200ms is the point at which a noticeable share of shoppers begin to perceive the search box as laggy on a storefront. It is a sensible default, not a law. The threshold is configurable per profile in the Sensitivity tab; a latency-sensitive catalogue may want 150ms, a complex faceted search may tolerate 300ms. p95 spiked for one window then recovered. Should I worry? A single 5-minute spike that self-recovers is often a segment merge, a brief GC pause, or a one-off heavy aggregation. Worry when the breach is sustained across several windows, or when it recurs at the same time each day (a scheduled job or a daily traffic pattern). Pair with GC Pause Time (5m total ms) to rule out garbage collection. Does p95 include aggregation queries? By default the card blends all query-phase operations on the in-scope indices, which includes aggregations. Heavy aggregations are a common tail driver. If you want to isolate plain search from analytics, scope the connector to the storefront index pattern only. How is the percentile calculated if Elasticsearch only exposes counters? Elasticsearch node stats give cumulativequery_total and query_time_in_millis, which alone yield only an average. Vortex IQ samples per-shard deltas across the 5-minute window and reconstructs a distribution, then reads the 95th percentile from it. This is why the card value can sit above the simple counter ratio you would compute by hand.
Can a healthy p50 hide a bad p95?
Yes, and that is exactly why p95 is a Hero card. A median of 35ms with a p95 of 280ms means most queries are fine but the slow tail is wide enough to hurt conversion. Always read p50 and p95 together; the gap between them is the diagnostic.