At a glance
Search Queries per Second (live) is the rate at which the cluster is serving search queries right now. It is the single best measure of demand on the search path, the denominator behind every latency and error percentage, and the first number to check when anything else moves. A latency card means little without the QPS context: 300ms p95 at 50 QPS is a query-shape problem; the same p95 at 800 QPS is a capacity problem.
| What it tracks | The current rate of completed search queries across the cluster, expressed in queries per second. |
| Data source | Derived from the indices.search.query_total counter in the Elasticsearch node stats API (GET /_nodes/stats/indices/search), differenced between consecutive polls and divided by the elapsed time. |
| Time window | RT (real-time, refreshed continuously; the live rate, not a period sum). |
| Alert trigger | None. QPS is a demand signal, not a health threshold; it is read for context and capacity, not paged on directly. |
| Why it matters | It is the load denominator. Every other Performance card (latency percentiles, error rate, pool saturation) is only interpretable against the QPS it was measured at. |
| What counts | Query-phase operations on searchable indices in the connector scope, including aggregations and _msearch sub-queries. |
| What does NOT count | Indexing operations (see Indexing Rate (docs/sec)), management API calls, and _cat/_cluster administrative requests. |
| Roles | owner, engineering, operations |
Calculation
Elasticsearch exposes a monotonicquery_total counter per node in the search index stats: every completed query-phase operation increments it. The counter is cumulative since node start, so the instantaneous rate is the delta between two consecutive samples divided by the seconds between them:
query_total counts shard-level query operations, a single user-facing search that fans out across several shards increments the counter once per shard; the card reports cluster-wide query operations per second, which is the figure that maps to thread-pool load. Where the connector is scoped to a specific index pattern, only that pattern’s shards contribute, isolating storefront search demand from background analytics traffic. The value is a rate, not a period total, so it tracks the current pulse of demand rather than a cumulative count.
Worked example
A platform team watches the QPS card across a normal trading day on the cluster behind their storefront. Readings taken on 14 Apr 26.| Time (BST) | QPS | What is happening |
|---|---|---|
| 04:00 | 22 | Overnight trough; mostly bots and health checks. |
| 09:30 | 210 | Morning ramp as traffic builds. |
| 12:45 | 480 | Lunchtime peak; steady. |
| 19:20 | 905 | Evening peak, plus an email campaign drop at 19:00. |
| 19:25 | 1,640 | Sudden doubling. |
- Demand is the headline, but it needs a partner. On its own a QPS spike could be good (a successful campaign) or bad (a runaway client or a crawler). The team immediately pairs it with Search QPS Spike vs Ecom Traffic, which shows storefront sessions flat while QPS doubled. That divergence is the signature of a bot crawler hammering search, not real shopper demand.
- It reframes the latency cards. During the spike, Search Latency p95 (ms) climbed from 150ms to 240ms. Without QPS context that looks like a regression; with it, the cause is plainly load. The fix is to shed the bot traffic, not to retune queries.
- It sets the capacity baseline. Knowing the cluster comfortably serves ~900 QPS at a healthy p95, but degrades past ~1,500 QPS, gives the team a concrete headroom figure for the next sale event and for replica-count planning.
Sibling cards
| Card | Why pair it with Search Queries per Second | What the combination tells you |
|---|---|---|
| Search Latency p95 (ms) | Latency is only interpretable against load. | p95 up with QPS up equals capacity; p95 up with flat QPS equals a query-shape or heap problem. |
| Search Latency p99 (ms) | The tail under the current demand. | A p99 spike at flat QPS is a pathological query, not load. |
| Search Latency p50 (ms) | The median under load. | A rising p50 as QPS climbs marks the cluster approaching its comfortable ceiling. |
| Search Error Rate % | Errors as a share of the QPS denominator. | Error rate climbing as QPS climbs means the search pool is saturating into rejections. |
| HTTP Connection Saturation % | Connection headroom under demand. | Saturation rising with QPS shows the connection tier nearing its limit before queries even run. |
| Indexing Rate (docs/sec) | The write-side load competing with search. | Heavy indexing alongside high QPS means search and indexing are contending for the same resources. |
| Search QPS Spike vs Ecom Traffic | Distinguishes real demand from bot traffic. | QPS up with storefront traffic flat equals a crawler, not shoppers. |
| ES Search Pool Saturation vs Ecom Burst | Whether the pool can absorb the current QPS. | High QPS plus high pool saturation during a burst signals imminent rejections. |
Reconciling against the source
Where to look in Elasticsearch’s own tooling:Why our number may legitimately differ:GET /_nodes/stats/indices/searchfor the rawquery_totalcounter per node; two samples seconds apart give the live rate.GET /<index>/_stats/searchfor the counter scoped to one index pattern.GET /_cat/thread_pool/search?v&h=name,active,queue,completedto see the search thread pool under the current load. Kibana Stack Monitoring → Overview → Search for the search-rate chart over time. On Elastic Cloud or AWS OpenSearch Service, the search-rate series in the cluster monitoring dashboard.
| Reason | Direction | Why |
|---|---|---|
| Shard fan-out | Our value higher | query_total counts shard-level operations; one user search across N shards increments the counter N times. A dashboard reporting request-level QPS will read lower. |
| Sample interval | Either | A live rate over a short poll interval resolves spikes that a 1-minute or 1-hour Kibana bucket averages out. |
| Index scope | Usually lower | A connector scoped to the storefront index excludes background analytics and admin queries. |
_msearch expansion | Our value higher | Multi-search bundles expand into individual query operations on the counter. |
| Counter reset | Brief dip | A node restart resets query_total; the first post-restart sample is discarded to avoid a negative delta. |
| Card | Expected relationship | What causes divergence |
|---|---|---|
| Search QPS Spike vs Ecom Traffic | QPS should track storefront session volume. | QPS rising while sessions stay flat means non-shopper traffic (crawler, runaway client, retries). |
| ES Search Pool Saturation vs Ecom Burst | Pool saturation should rise and fall with QPS. | Saturation high at modest QPS means slow queries holding threads, not raw demand. |
Known limitations / FAQs
Why does the QPS look higher than my application’s request rate? Elasticsearch counts query operations at the shard level. A single user-facing search that fans out across, say, 5 shards incrementsquery_total five times. The card reports cluster query operations per second, which is the figure that maps to thread-pool load, not the request-level rate your application sees. To compare like for like, divide by the number of primary shards the index queries.
Why is there no alert on this card?
QPS is a demand signal, not a health signal. High QPS is usually good news (traffic). The point is to read it as context for the cards that do alert: latency, error rate, pool saturation. A QPS spike that hurts is caught by those cards crossing their own thresholds. If you want an alert on unusual demand, set one on Search QPS Spike vs Ecom Traffic, which compares QPS against storefront traffic.
QPS dropped to near zero but the site is up. Should I worry?
Possibly. A genuine traffic trough is fine, but a sudden drop to near zero during trading hours can mean search requests are failing before they reach Elasticsearch (an application-tier or load-balancer fault), or the connector lost its scope. Cross-check storefront sessions and Search Error Rate %; a real demand trough shows low QPS with no errors, a fault shows low QPS with sessions still arriving.
Does QPS include indexing?
No. This card is search only, derived from query_total. The write-side equivalent is Indexing Rate (docs/sec). The two together describe total cluster load, since search and indexing compete for heap and I/O.
How quickly does the live value update?
QPS is reported in real time on the standard poll interval. Because it is a rate over a short interval, it responds within seconds to a genuine change in demand, which is what makes it useful as the first card to check when latency or errors move.
My QPS looks spiky even when traffic is smooth. Why?
Short-interval rates are inherently more variable than smoothed dashboard charts, and _msearch bundles or scheduled aggregations can arrive in bursts. If you want a smoother view for capacity discussions, read the trend over several intervals rather than the instantaneous value, or compare against the Kibana search-rate chart bucketed over a minute.