Search Error Rate %, Elasticsearch - Vortex IQ Help Centre

Card class: Hero • Category: Errors

At a glance

The share of search requests that fail, expressed as a percentage of total search requests over the window. A failed search is one that returns an error (non-2xx) or completes with shard failures rather than a clean result set. This is the most direct measure of “is search working for users right now”. Unlike latency, which degrades gracefully, an error rate spike is binary from the user’s point of view: their query returned nothing usable. For a storefront, a climbing search error rate maps straight onto shoppers who cannot find products.


API basis	Search counters from `GET /_nodes/stats/indices/search` and per-request shard-failure data. Errors are counted from failed search requests plus searches that completed with `_shards.failed > 0` (partial results); total is `search.query_total` delta over the window.
Metric basis	A ratio: failed searches divided by total searches in the window, as a percentage. Both hard failures (rejected, timed out, malformed) and partial failures (some shards failed) are counted.
Aggregation window	`5m` rolling, so a brief blip self-clears while a sustained problem is caught quickly.
Alert threshold	`> 1%`. Above 1% of searches failing, a meaningful slice of users is affected and the gauge trips red.
Why a gauge	The value is a bounded percentage with a clear danger band, so it renders as a gauge; the needle crossing into red is the page-worthy signal.
What counts	HTTP non-2xx search responses, thread-pool rejections on the search pool, query-phase and fetch-phase failures, timeouts, and searches returning `_shards.failed > 0`.
What does NOT count	Indexing/write errors (a separate pipeline, see Bulk Rejections), client-side network failures that never reached the cluster, and zero-result searches (a successful query that simply matched nothing is not an error).
Time window	`5m` (rolling)
Alert trigger	`> 1%`, more than one search in a hundred failing is user-visible breakage.
Roles	platform, sre, dba

Calculation

The card computes a delta ratio over the five-minute window:

failed   = (failed_search_requests + searches_with_shard_failures) over 5m
total    = search.query_total(now) - search.query_total(5m ago)
error_rate_pct = (failed / total) * 100      # guard: 0 when total == 0

“Failed” deliberately includes two distinct categories. Hard failures are searches that returned an error to the client: a 5xx, a search thread-pool rejection (es_rejected_execution_exception), a timeout, or a malformed-query 4xx. Partial failures are searches that returned a 200 but with _shards.failed > 0, meaning some shards could not respond and the result set is incomplete; from a user’s perspective this is silent data loss in the results and is treated as an error here. Counting both is important because partial failures are insidious: the application sees a 200 and renders results, but those results are missing whatever the failed shards held. The 5m window balances responsiveness against noise: a single transient failure in a low-traffic minute will not trip the gauge, but a genuine spike that affects a sustained fraction of traffic shows up within minutes. The > 1% threshold reflects that search is a primary user journey: even a low single-digit error rate means a noticeable cohort of users got a broken experience.

Worked example

A platform team runs an Elasticsearch cluster behind the search bar of a fashion retailer. Baseline search error rate is ~0.02% (the odd malformed query from a bot). On 09 Apr 26 at 12:50, during the lunchtime traffic peak, the Search Error Rate gauge climbs to 3.7% and trips red. Breaking down the failures from GET /_nodes/stats/indices/search and the cluster’s error logs:

failure type	share of failures	signature
Search thread-pool rejection	71%	`es_rejected_execution_exception`, search queue full
Partial shard failure	24%	200 responses with `_shards.failed: 1`
Timeout	5%	queries exceeding the client’s 1s timeout

The dominant cause is search thread-pool rejection: the search queue on the data nodes is full and the cluster is rejecting incoming searches outright to protect itself. The team checks HTTP Connection Saturation % (88%, high but not the cause) and Search Latency p95 (ms) (climbing from 90ms to 640ms) and finds the real trigger.

Root cause chain:
  - A marketing email went out at 12:45 driving a 3x traffic spike to the search bar.
  - A new "search suggestions" feature fires an extra wildcard query per keystroke.
  - Wildcard queries are expensive; each one holds a search thread far longer than a normal term query.
  - The search thread pool (fixed size = number of CPUs * 1.5 + 1) filled up.
  - With the pool full, the queue filled, and new searches were rejected -> the 71% rejections.
  - Some shards on the busiest node timed out mid-query -> the 24% partial failures.

Immediate mitigation: the team debounces the suggestion feature on the client (fire after 300ms of no typing instead of per-keystroke), instantly cutting query volume. Within five minutes the error rate falls to 0.4%. Structurally, they rewrite the suggestion query from an expensive leading-wildcard to a search_as_you_type field (far cheaper), and add an index.search.idle and a sensible client-side timeout-and-retry. By 13:10 the error rate is back to baseline.

What 3.7% cost during the spike:
  - At the peak ~8,000 searches/min, 3.7% = ~296 failed searches/min.
  - Each failed search is a shopper who typed a query and saw "no results" or an error.
  - Pair with the storefront conversion cards: a search error during peak traffic is
    a direct, measurable drop in the search-to-cart funnel.

Three takeaways:

Most search-error spikes are self-inflicted load, not cluster faults. A new feature, an expensive query pattern, or a traffic burst fills the fixed-size search thread pool, and the cluster sheds load by rejecting. The fix is usually on the query/client side, not the cluster.
Partial shard failures are silent and dangerous. A 200 with _shards.failed > 0 looks fine to the application but returns incomplete results. Counting these in the error rate surfaces a failure mode that latency and HTTP-status monitoring miss.
Read it alongside latency and saturation. Error rate is the outcome; latency and connection saturation are the leading indicators. A rising p95 that crosses into rejections is the typical path to a search-error spike.

Sibling cards

Card	Why pair it with Search Error Rate	What the combination tells you
Search Latency p95 (ms)	The leading indicator before errors begin.	Rising p95 that tips into rejections is the standard route to a search-error spike.
Search Latency p99 (ms)	The tail that times out first.	A p99 blowout often becomes the timeout portion of the error rate.
Search Queries per Second (live)	The load that fills the search pool.	An error spike that tracks a QPS spike is load-driven; one that does not is a query or cluster fault.
HTTP Connection Saturation %	The front door that refuses clients when full.	High saturation plus errors means clients are refused before queries even run.
Circuit Breaker Trips (24h)	The memory-protection mechanism that rejects queries.	Breaker trips plus search errors means heavy queries are being rejected to avoid OOM.
JVM Heap Used %	Heap pressure causes rejections and breaker trips.	High heap plus search errors points at memory-bound query failures.
Slow-Query Rate %	Slow queries precede timeouts and rejections.	A rising slow-query rate is the early warning before errors climb.
Slow Searches During Checkout Window (5m)	The cross-channel revenue framing of search failure.	Correlates search errors with the checkout funnel to size revenue impact.

Reconciling against the source

Where to look in Elasticsearch itself:

GET /_nodes/stats/indices/search gives query_total and query_time_in_millis; combined with the search thread-pool stats it shows the denominator and the rejection signal. GET /_cat/thread_pool/search?v&h=node_name,active,queue,rejected is the fastest way to confirm search thread-pool rejections, the most common error cause; a non-zero and rising rejected column is the smoking gun. The cluster logs (or the slow log) capture per-query failures and _shards.failed details; the application or proxy access logs hold the authoritative non-2xx HTTP rate as the client experienced it.

Why our number may legitimately differ from a manual reading:

Reason	Direction	Why
Partial-failure counting	Card higher	We count 200 responses with `_shards.failed > 0` as errors; a pure HTTP-status check at a proxy would not.
Window boundary	Either	The card’s 5-minute delta and your manual snapshot bracket different intervals.
Rejection accounting	Either	Thread-pool `rejected` is a cumulative counter; reading it raw versus as a windowed delta gives different rates.
Where errors are measured	Either	The cluster’s view (rejections, shard failures) can differ from the client/proxy view (which also sees network failures the cluster never saw).
Managed service abstraction	Either	Elastic Cloud and AWS-managed consoles may present an aggregated request-error metric at their own granularity.

Cross-connector reconciliation:

Card	Expected relationship	What causes divergence
Search Latency p95 (ms)	Errors should follow a latency climb under load.	Errors spiking with calm latency points at malformed queries or shard failures, not load.
Search Queries per Second (live)	A load-driven error spike tracks a QPS spike.	Errors rising with flat QPS means a query pattern changed or a node degraded, not volume.

Known limitations / FAQs

Does a zero-result search count as an error? No. A search that runs successfully and simply matches no documents is a valid result, not a failure; it returns a 200 with an empty hits array. This card counts only searches that errored (non-2xx, rejection, timeout) or completed with _shards.failed > 0. A high zero-result rate is a relevance/merchandising concern, not a reliability one, and is tracked separately. What is a partial shard failure and why does it count as an error? When a search fans out to all shards of an index and one or more shards cannot respond (the node is overloaded, the shard is recovering, a circuit breaker tripped), Elasticsearch can still return a 200 with the partial results it did get, flagged by _shards.failed > 0 in the response. The application usually renders those incomplete results as if they were complete, so users silently miss whatever the failed shards held. Because that is a broken result from the user’s perspective, we count it as an error. My error rate spiked but every failure is es_rejected_execution_exception. What does that mean? The search thread pool is full and the cluster is shedding load by rejecting new searches to protect itself. The pool is fixed-size (roughly the node’s CPU count times 1.5 plus 1) by design, so the fix is to reduce the load reaching it: debounce or cache client queries, replace expensive query patterns (leading wildcards, deep pagination, huge aggregations) with cheaper equivalents, and add client-side timeouts with backoff. Scaling out data nodes adds search threads if the load is genuinely legitimate. Errors are climbing but latency looks fine. How is that possible? That pattern usually means the failures are not load-driven. Common causes: a deploy shipped a malformed query that 4xxs, a mapping change broke a query against a now-missing field, a specific shard or node is failing (partial failures) while the rest serve fast, or a circuit breaker is rejecting only the heavy queries. Look at the failure-type breakdown rather than assuming a capacity problem. Can I tune the alert threshold? Yes, the sensitivity threshold is configurable per profile. The default > 1% suits user-facing storefront search where any meaningful failure cohort matters. A purely internal analytics cluster with retrying batch clients might tolerate a higher threshold. Set it against your own baseline and the user impact of a failed search, not the generic default. Why count both HTTP errors and shard failures together instead of separately? Because from the user’s standpoint both produce a broken search experience: a hard error returns nothing, and a partial failure returns incomplete results the user cannot tell are incomplete. A single combined rate is the truest “search is broken for users” signal. For root-cause work you still get the breakdown by failure type; the headline gauge intentionally unifies them so nothing user-visible hides behind a clean HTTP-status number. A retry on the client masks these errors. Should I still care? Yes. Client retries can paper over a transient spike for the end user, but the cluster is still rejecting and re-serving requests, which amplifies load (each retry is another query against an already-strained pool) and can turn a small spike into a retry storm. The card measures the cluster’s true error rate before client retries, which is the honest signal of cluster health; a high rate that users do not feel today is a fragility waiting to tip over under more load.

Tracked live in Vortex IQ Nerve Centre

Search Error Rate % is one of hundreds of KPI pulses Vortex IQ tracks across Elasticsearch and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.

​At a glance

​Calculation

​Worked example

​Sibling cards

​Reconciling against the source

​Known limitations / FAQs

​Tracked live in Vortex IQ Nerve Centre