At a glance
Search Latency p99 (ms) is the time below which 99% of search queries complete: only the worst 1% take longer. This is the extreme tail, the experience of your unluckiest shoppers and the canary for cluster stress. p99 is volatile by nature, so it carries a higher 500ms threshold than p95. When p99 spikes while p95 stays calm, a small number of pathological queries are to blame; when p99 and p95 climb together, the whole cluster is under pressure.
| What it tracks | The 99th-percentile query service time across all search shards for the selected period. The worst 1% of queries take longer than this value. |
| Data source | Reconstructed from indices.search.query_time_in_millis divided by query_total delta, read from the Elasticsearch node stats API (GET /_nodes/stats/indices/search). Vortex IQ builds a percentile distribution across the window. |
| Time window | RT/5m (real-time, rolling 5-minute window, refreshed continuously). |
| Alert trigger | > 500ms. A sustained p99 above 500ms means even the worst-case search is crossing into clearly painful territory and the tail is at risk of widening further. |
| Why it matters | p99 is the early-warning canary. Tail latency degrades before the median does, so a rising p99 buys a DBA time to act before p95 (and conversion) follow it up. |
| What counts | Query-phase service time on the data nodes for search and _search-type requests, including heavy aggregations on in-scope indices. |
| What does NOT count | Browser-to-app network time, application-tier overhead, the fetch phase measured separately, and queries on indices excluded by the connector scope. |
| Roles | engineering, operations, owner |
Calculation
The two source counters are the same as the other latency cards:query_total (query-phase operations completed) and query_time_in_millis (cumulative query-phase milliseconds), exposed per node in the search index stats. Vortex IQ samples both on each poll, takes consecutive deltas, and assembles the per-shard service times into a distribution across the rolling 5-minute window. The 99th percentile is read from that distribution and reported in milliseconds.
The 99th percentile is far more sensitive to individual slow operations than p95 or p50. A single deep-pagination request, an unbounded wildcard, a cold-cache query after a segment merge, or a GC pause on one node can push p99 up sharply while leaving the median untouched. That sensitivity is the point: p99 is meant to surface the worst case so it can be caught before it spreads. Like its siblings, the value is cluster-wide unless the connector is scoped to a specific index pattern, in which case only those shards contribute, isolating the storefront-facing path from background workloads.
Worked example
The same 6-node cluster behind a high-traffic storefront. Snapshot taken on 22 Apr 26 at 02:10 BST, during an overnight batch reindex.| Percentile | Reading | Window |
|---|---|---|
| p50 | 41ms | RT/5m |
| p95 | 180ms | RT/5m |
| p99 | 740ms | RT/5m |
- Only the worst 1% is affected. p95 holding at 180ms means 95% of shoppers are fine; the pain is concentrated in a thin tail. With p99 at 740ms against p95 at 180ms, that tail is steep, pointing at a handful of expensive operations rather than a saturated cluster.
- It coincides with the reindex. A nightly batch reindex is running, generating large segment merges. Indexing Rate (docs/sec) is elevated, and merges compete for I/O and heap with search. Cold-cache queries hitting freshly merged segments land in the tail.
- Heap is the multiplier. JVM Heap Used % sits at 78%, above the 75% GC-pressure line, and GC Pause Time (5m total ms) shows 1,200ms of cumulative pause. A 300ms stop-the-world pause lands directly in p99 for any query unlucky enough to overlap it.
Sibling cards
| Card | Why pair it with Search Latency p99 | What the combination tells you |
|---|---|---|
| Search Latency p95 (ms) | The broad storefront-facing tail. | p99 up with p95 calm equals a few pathological queries; both up equals systemic pressure spreading. |
| Search Latency p50 (ms) | The median baseline. | A huge p50-to-p99 gap confirms the problem is purely tail; a rising p50 means the whole distribution is shifting. |
| GC Pause Time (5m total ms) | GC pauses land directly in the tail. | High GC pause with a p99 spike means stop-the-world pauses are the cause. |
| JVM Heap Used % | Heap pressure drives the GC pauses. | Heap above 75% with p99 climbing means heap is the multiplier. |
| Top 10 Slow Searches | The actual queries in the tail. | Names the pathological query shapes feeding p99. |
| Slow-Query Rate % | The share of all searches over the slowlog threshold. | A p99 spike with a flat slow-query rate confirms it is the thin 1%, not a growing fraction. |
| Indexing Rate (docs/sec) | Heavy indexing and merges compete with search. | p99 spiking alongside an indexing surge points at merge contention. |
| Search Error Rate % | The failure peer. | p99 climbing then errors appearing means queries are timing out, not just slowing. |
Reconciling against the source
Where to look in Elasticsearch’s own tooling:Why our number may legitimately differ:GET /_nodes/stats/indices/searchfor the rawquery_totalandquery_time_in_milliscounters per node; the lifetime ratio is an average, not a percentile.GET /<index>/_stats/searchfor the same counters scoped to one index pattern. Kibana Stack Monitoring → Overview → Search for the latency series over time, and the search-slowlog for the queries feeding the tail. On Elastic Cloud or AWS OpenSearch Service, the search-latency chart in the cluster monitoring dashboard.
| Reason | Direction | Why |
|---|---|---|
| Percentile vs counter average | Our value higher | The node stats ratio is a window average; the 99th percentile sits well above the average, especially with a steep tail. |
| Window length | Either | The rolling 5-minute window resolves spikes that a coarser Kibana bucket would smooth away. |
| Sample density | Either | At low QPS the tail is built from fewer samples, so p99 is noisier; high QPS gives a more stable estimate. |
| Index scope | Usually lower | A connector scoped to the storefront index excludes background analytics queries. |
| Phase boundary | Usually lower | Query phase only; end-to-end request time adds the fetch phase and coordinating-node overhead. |
| Card | Expected relationship | What causes divergence |
|---|---|---|
| ES Search Pool Saturation vs Ecom Burst | p99 tends to lead pool saturation during a burst. | p99 spiking with low pool saturation means a query-shape or GC cause, not capacity. |
| Slow Searches During Checkout Window (5m) | Tail-latency queries should appear in the slow-search list when they land near checkout. | A p99 spike with no checkout-window slow searches means the tail is on non-purchase paths. |