At a glance
The number of cluster-state change tasks queued on the elected master node, read from GET /_cluster/pending_tasks. Every shard allocation, index create or delete, mapping update, and settings change flows through this single ordered queue. A healthy cluster drains it to zero within milliseconds. A persistently non-zero queue means the master is overloaded with cluster-state updates and cannot keep pace, which delays shard recovery, blocks new indices, and can cascade into a yellow or red cluster.
| API endpoint | Cluster Pending Tasks API, GET /_cluster/pending_tasks. Returns each queued task with its priority, source (what triggered it), insert_order, and time_in_queue_millis. |
| Metric basis | A point-in-time count of tasks waiting in the master’s cluster-state update queue. This is queue depth, not throughput. The companion field time_in_queue_millis on the oldest task tells you how long the head of the queue has been stuck. |
| Aggregation window | Real-time (RT), polled on the standard cluster-health cadence. The value is instantaneous, so brief spikes during a legitimate operation (a rolling restart, a large reindex) are expected. |
| Alert threshold | > 10 sustained for 5 minutes. A momentary spike is normal; a queue that stays above 10 for five minutes means the master cannot drain faster than work arrives. |
| Priority awareness | Tasks carry a priority (IMMEDIATE, URGENT, HIGH, NORMAL, LOW, LANGUID). The master processes higher priorities first, so a queue full of LOW reindex tasks behind one URGENT shard allocation is less alarming than ten URGENT tasks stacked up. |
| What counts | Cluster-state mutations only: shard allocation and relocation decisions, index create/delete/open/close, mapping and settings updates, alias changes, ILM and template applications. |
| What does NOT count | Search and indexing traffic (those never touch this queue), per-node tasks visible in GET /_tasks, and background segment merges. Confusing _cluster/pending_tasks with _tasks is a common mistake; they are different queues. |
| Time window | RT (real-time, polled on the cluster-health cadence) |
| Alert trigger | > 10 sustained 5m, a queue that will not drain points at master-node saturation. |
| Roles | platform, sre, dba |
Calculation
The card reads the array returned byGET /_cluster/pending_tasks and counts its length. In Elasticsearch terms:
time_in_queue_millis so you can tell a deep-but-fast-draining queue from a shallow-but-stuck one. Cluster-state updates are single-threaded on the elected master by design: this guarantees a consistent, ordered view of the cluster, but it also means the master is the bottleneck. When the count climbs and stays up, the master is either CPU-bound, GC-bound, or generating cluster states so large that publishing each one to the other nodes takes too long. The alert fires on > 10 sustained 5m so that genuine bursts (a rolling restart relocates many shards at once) do not page anyone, while a master that has truly fallen behind does.
Worked example
A platform team runs a 6-node Elasticsearch 8.x cluster (3 dedicated master-eligible nodes, 3 data nodes) backing product search and log analytics for a mid-size retailer. At 09:14 on 14 Apr 26 the on-call SRE sees the Pending Cluster Tasks card jump from its usual0 to 47 and hold there.
Drilling into the raw API response:
| insert_order | priority | source | time_in_queue_millis |
|---|---|---|---|
| 88412 | URGENT | shard-failed | 41,800 |
| 88413 | URGENT | shard-started | 39,200 |
| 88414 | HIGH | create-index [logs-2026.04.14] | 12,500 |
| … (44 more, mostly NORMAL) | NORMAL | put-mapping / ilm-execute | 1,000 to 30,000 |
URGENT shard-failed/shard-started pairs sit at the head, so the cluster is trying to recover shards but the master cannot publish the resulting cluster states fast enough.
The SRE checks the master’s vitals and finds the symptom: JVM Heap Used % on the elected master is at 91% and GC Pause Time (5m total ms) shows 3,400ms of stop-the-world pauses in the last five minutes. The master is spending so long in garbage collection that it cannot drain its own task queue.
NORMAL put-mapping churn. Within 90 seconds of GC pressure easing, the queue drains to 0 and the cluster returns to green.
Three takeaways:
- Pending tasks is a master-health signal, not a traffic signal. It moves because of cluster-state work, so always read it alongside master-node JVM heap and GC. A spiking queue with a calm master usually self-heals; a spiking queue with a hot master is the real incident.
- Read the priority mix and the oldest wait, not just the count. Fifty
LOWreindex tasks draining steadily is fine. TwoURGENTtasks stuck for 40 seconds is not. - Dedicated master nodes earn their keep here. If master duties share a node with data and search load, this queue is the first thing to suffer under traffic.
Sibling cards
| Card | Why pair it with Pending Cluster Tasks | What the combination tells you |
|---|---|---|
| Cluster Status (green / yellow / red) | The outcome a stuck queue eventually produces. | A growing queue plus a slide to yellow means shard recovery is blocked on the master. |
| JVM Heap Used % | The most common root cause: a heap-pressured master. | High master heap plus a high queue equals “the master cannot drain cluster-state work”. |
| GC Pause Time (5m total ms) | The mechanism that stalls cluster-state publishing. | Long GC pauses on the master directly translate into rising queue depth. |
| Initializing / Relocating Shards | The work that floods the queue during recovery. | Many initializing shards plus a high queue equals a recovery the master cannot keep up with. |
| Unassigned Shards | What stays broken while the queue is stuck. | Unassigned shards that will not allocate often trace back to a backed-up pending-tasks queue. |
| Active Node Count | A node loss is a classic trigger for a queue spike. | A drop in node count followed by a queue spike is the shard-failed/shard-started recovery storm. |
| Elasticsearch Health Score | The composite that folds queue depth into overall health. | A health-score dip with no obvious traffic cause often points back here. |
Reconciling against the source
Where to look in Elasticsearch itself:Why our number may legitimately differ from a manual API call:GET /_cluster/pending_tasksis the canonical source; the card reads it verbatim. The human-friendly view isGET /_cat/pending_tasks?v, which printsinsertOrder,timeInQueue,priority, andsourceas a table.GET /_cluster/healthshows the downstream effect (status,unassigned_shards,initializing_shards).GET /_nodes/stats/jvmon the elected master shows the heap and GC pressure that usually drives a stuck queue. Identify the master withGET /_cat/master?v.
| Reason | Direction | Why |
|---|---|---|
| Polling instant vs your instant | Either | The queue can change in milliseconds. The card’s last poll and your manual curl are rarely the exact same moment, so a draining queue may read 12 for us and 3 for you seconds later. |
| Sustained-window smoothing | Card may not alert when a raw call is high | The alert needs > 10 sustained 5m; a single high reading you catch by hand will not trip the card. |
| Managed-service proxies | Either | On Elastic Cloud or AWS OpenSearch/Elasticsearch-compatible offerings, the console may sample at its own cadence; compare like-for-like timestamps. |
_tasks confusion | Large divergence | If you are comparing against GET /_tasks (per-node task framework), that is a different queue entirely and will not match. |
| Card | Expected relationship | What causes divergence |
|---|---|---|
| JVM Heap Used % | A sustained high queue should coincide with master heap pressure. | If heap is calm but the queue is high, suspect oversized cluster states (too many indices/shards) rather than GC. |
| Cluster Status | A stuck queue and a non-green status usually move together during recovery. | A green cluster with a high queue is an early warning before any status change. |
Known limitations / FAQs
The queue spiked to 60 during a rolling restart but never alerted. Is the card broken? No, that is the design. A rolling restart relocates many shards at once, so a transient spike is expected and healthy. The alert only fires on> 10 sustained 5m. If the spike drained within a minute or two, the master kept up and there is nothing to act on. The card is protecting you from paging on normal maintenance.
What is the difference between _cluster/pending_tasks and _tasks?
_cluster/pending_tasks is the single, ordered queue of cluster-state updates on the elected master (shard allocation, index creation, mapping changes). _tasks is the per-node task-management framework that tracks in-flight operations like a long-running search or reindex. This card reads the former. A backed-up search will show in _tasks, not here.
The count is high but every task is priority LOW. Should I worry?
Less so. The master processes by priority, so URGENT and HIGH tasks (the ones that affect availability) jump the queue. A deep tail of LOW reindex or ILM tasks that is draining steadily is usually fine. Watch the oldest time_in_queue_millis: if even the URGENT tasks are aging, that is the problem, not the raw count.
My queue is stuck but JVM heap looks fine. What else causes this?
Oversized cluster states. If you have tens of thousands of shards or indices, each cluster-state publish is large and slow to serialise and send to every node, independent of heap. Check total shard count (aim for under ~20 shards per GB of heap as a rule of thumb), prune unused indices, and consolidate small indices. Network latency between master-eligible nodes during the two-phase publish can also stall the queue.
Can I clear or reorder the pending-tasks queue manually?
No. There is no API to flush or reprioritise it; the ordering and single-threaded processing are what give Elasticsearch a consistent cluster state. The only levers are relieving the master (heap, GC, CPU), reducing the rate of cluster-state changes (throttle reindex/ILM churn), and shrinking the cluster state (fewer shards/indices).
Does a high queue mean I am losing data?
Not directly. It means cluster-state changes are delayed, which can block new index creation and slow shard recovery, and that recovery delay is what risks availability (yellow/red). Ingestion already in flight to existing indices is largely unaffected unless the delay is severe enough to push the cluster red. Pair this card with Unassigned Shards to gauge real data-availability risk.
Why is this single-threaded? Surely parallelising would help.
Cluster-state updates must be applied in a strict, total order so every node agrees on the same view of the cluster. Parallel application would break that consistency guarantee. The trade-off is that the master is a serial bottleneck, which is exactly why this card matters and why dedicated, well-provisioned master nodes are recommended for any cluster of meaningful size.