At a glance
Avg Query Queue Depth per Warehouse measures how many queries, on average, are waiting in line before they can run on each warehouse. In Snowflake a query queues when its warehouse has no free slot, either because all running slots are busy (overload) or because a multi-cluster warehouse is still spinning up a cluster (provisioning). A small, transient queue is normal under burst; a queue that stays deep is the clearest single signal that a warehouse is undersized for its workload. This is a Snowflake-distinctive metric: because compute and storage are decoupled and warehouses scale independently, sustained queueing is something you fix by resizing or enabling multi-cluster scaling, not by tuning queries.
| What it tracks | The average number of queries waiting before execution, per warehouse, over the selected period, derived from queue-time signals in query history. |
| Data source | detail: From QUEUED_PROVISIONING_TIME plus QUEUED_OVERLOAD_TIME in QUERY_HISTORY. Snowflake-distinctive: sustained queue equals warehouse undersized. |
| Time window | 1h (rolling last hour, refreshed on the live polling cycle). |
| Alert trigger | > 5 sustained. A queue depth holding above 5 queries pages the platform on-call. |
| Roles | owner, platform, SRE, data engineering, FinOps |
Calculation
The card derives queue depth from the queue-time columns Snowflake records on every query inQUERY_HISTORY: QUEUED_OVERLOAD_TIME (milliseconds a query waited because the warehouse had no free compute slot) and QUEUED_PROVISIONING_TIME (milliseconds a query waited while a multi-cluster warehouse provisioned an additional cluster). For each warehouse, the engine uses the accumulated queue time across concurrent queries in the window to estimate the average number of queries waiting at once, per warehouse, rather than reporting raw wait milliseconds. A reading of “0 to 1” means queries rarely wait; a reading of “5 sustained” means, on average, five queries are stacked behind the running set at any moment, which is the line where the workload is materially throttled by compute, not by the queries themselves. See the worked example below for how to read it against credit cost.
Worked example
A data platform team runsBI_WH (a Medium, single-cluster warehouse) serving fifteen concurrent dashboard users plus an hourly ELT job that lands on the same warehouse. Snapshot taken on 16 Apr 26 at 10:15 BST, mid-morning peak.
| Warehouse | Size | Clusters | Avg queue depth (1h) | Read |
|---|---|---|---|---|
BI_WH | Medium | 1 (max 1) | 6.4 | Sustained breach, queries stacking up |
ELT_WH | Large | 1 | 0.3 | Healthy |
ADHOC_WH | Small | 1 (max 3) | 1.1 | Brief provisioning waits, acceptable |
BI_WH at an average queue depth of 6.4, above the threshold of 5 sustained. The platform team’s read:
BI_WHis undersized for its concurrent load, full stop. Six queries waiting on average at peak means dashboard users are watching spinners while their queries sit behind the ELT job and each other. The queue isQUEUED_OVERLOAD_TIMEdominant (no free slots), not provisioning, becauseBI_WHis single-cluster and cannot spin up a second cluster to absorb the burst.- The cause is two workloads sharing one warehouse. The hourly ELT job and fifteen interactive users are competing for the same Medium warehouse. Interactive dashboards need low-latency concurrency; batch ELT needs throughput. Putting them on the same single-cluster warehouse guarantees contention at the top of every hour.
- The fix is a warehouse change, and it can save money. Two options: enable multi-cluster scaling on
BI_WH(setMIN_CLUSTER_COUNT = 1,MAX_CLUSTER_COUNT = 3,SCALING_POLICY = STANDARD) so it adds clusters automatically under queue pressure and drops them when the burst passes, or move the ELT job to its own warehouse so interactive users stop competing with it. Multi-cluster only bills extra clusters while they run, so it absorbs the peak without paying for a permanently larger warehouse.
- Sustained queue depth is the cleanest “undersized warehouse” signal Snowflake gives you. Unlike latency, which mixes execution time and wait time, queue depth isolates the wait. A deep, sustained queue is almost never a query problem; it is a capacity problem you fix by resizing or scaling clusters.
- Separate overload queueing from provisioning queueing.
QUEUED_OVERLOAD_TIMEmeans “no free slots, add capacity”;QUEUED_PROVISIONING_TIMEmeans “a cluster is still warming up”, which is brief and self-correcting on multi-cluster warehouses. A queue that is mostly provisioning time settles on its own; a queue that is mostly overload time will not. - Queueing and credit cost pull in opposite directions, so read them together. Resizing up clears the queue but burns more credits; under-provisioning saves credits but throttles users. Pair this card with Credits by Warehouse (7d) and Avg Cost per Query ($) to find the size that clears the queue without overspending.
Sibling cards to reference together
| Card | Why pair it with Avg Query Queue Depth | What the combination tells you |
|---|---|---|
| Warehouse Saturation % | Confirms the running slots are maxed before queries even queue. | High saturation plus deep queue equals a warehouse fully spoken for and turning work away. |
| Query Latency p95 (ms) | p95 includes queue time; this card isolates the wait. | A high p95 explained by a deep queue is a capacity problem, not a query problem. |
| Query Latency p99 (ms) | A queued query can land in the extreme tail. | Flat queue plus high p99 means a heavy query; deep queue plus high p99 means waiting, not work. |
| Credits by Warehouse (7d) | The cost side of any resize decision. | Find the smallest warehouse change that clears the queue without overspending. |
| Avg Cost per Query ($) | Quantifies the credit impact of resizing up. | A modest cost-per-query rise to clear a deep queue is usually worth it. |
| Warehouse Queueing Sustained (>5 queries queued) | The Nerve Centre alert built on this metric. | This card is the live gauge; the alert is the paging event when the queue holds above 5. |
| Active Warehouses | Context on how many warehouses are running. | Deep queue on one warehouse while others sit idle suggests workload routing, not total capacity. |
| Snowflake Health Score | The composite that weights queueing. | Sustained queueing drags the composite down even when latency averages look acceptable. |
Reconciling against the source
Where to look in Snowflake’s own tooling:Snowsight to Admin to Warehouses, select a warehouse, and read the activity chart: it shows running versus queued queries over time, which is the visual equivalent of this card.To inspect average queued load per warehouse over the last hour:SNOWFLAKE.ACCOUNT_USAGE.WAREHOUSE_LOAD_HISTORYforAVG_RUNNINGandAVG_QUEUED_LOADper warehouse over time, the most direct native source.QUERY_HISTORYfor the per-queryQUEUED_OVERLOAD_TIMEandQUEUED_PROVISIONING_TIMEcolumns that underpin the card.
| Reason | Direction | Why |
|---|---|---|
| Source view | Marginal | The Warehouses chart in Snowsight samples WAREHOUSE_LOAD_HISTORY; the card derives from per-query queue times in QUERY_HISTORY. Both measure the same thing but aggregate slightly differently. |
| ACCOUNT_USAGE latency | Brief lag | WAREHOUSE_LOAD_HISTORY and QUERY_HISTORY in ACCOUNT_USAGE can trail live activity by up to 45 minutes. |
| Overload vs provisioning | Variable | If the card weights overload and provisioning queueing equally and you read only overload in the UI, a multi-cluster warehouse mid-provision can show a higher card value. |
| Per-warehouse vs account | Apparent gap | The card reports per warehouse; an account-wide view averages across all warehouses and reads lower when most are idle. |
| Card | Expected relationship | What causes divergence |
|---|---|---|
slow-analytics-queries-during-checkout-window | Queueing during peak ecom windows is higher-impact. | A queue building during checkout peak delays the live dashboards the business is watching. |
| Ecom order volume (Shopify / BigCommerce / Adobe) | No direct causal link. | A queue spike during a promotion slows merchandising’s reporting exactly when they need it fastest. |
Known limitations / FAQs
My queue depth spikes briefly at the top of every hour then clears. Should I worry? Usually not. A short spike that clears within minutes is a normal burst, typically a scheduled job kicking off. The alert fires on sustained queue depth above 5, not on transient spikes, precisely because brief queueing is expected. Only act if the depth holds above the threshold for a sustained period; that is the signal of a genuine capacity shortfall. What is the difference between overload queueing and provisioning queueing?QUEUED_OVERLOAD_TIME is time spent waiting because the warehouse’s running slots were all busy: the warehouse is simply too small or too contended, and the fix is more capacity. QUEUED_PROVISIONING_TIME is time spent waiting while a multi-cluster warehouse spins up an additional cluster: it is brief and self-correcting. A queue dominated by overload time needs a resize or more clusters; a queue dominated by provisioning time settles on its own.
Should I resize the warehouse up or enable multi-cluster scaling?
Resize up when a single query is genuinely too heavy for the current size (it needs more memory and compute per query). Enable multi-cluster scaling when the problem is concurrency: many queries competing for slots. Most sustained-queue cases are concurrency problems, so multi-cluster is usually the better and cheaper answer, because it only bills extra clusters while they are needed.
Why is queue depth a Snowflake-specific concern?
Because Snowflake decouples compute from storage and lets each warehouse scale independently, queueing is something you control directly by sizing and multi-cluster policy. On a traditional database, contention is a tuning and indexing problem; on Snowflake, sustained queueing is first and foremost a warehouse-configuration problem. That is why this metric points so cleanly at “the warehouse is undersized”.
Can a deep queue cause query failures?
Indirectly, yes. A query that queues long enough can breach STATEMENT_TIMEOUT_IN_SECONDS (the timeout clock includes queue time) and be cancelled. If you see a deep queue alongside a rising Query Error Rate %, some queries are timing out while waiting. Clearing the queue resolves both.
One warehouse is queueing while others sit idle. What does that mean?
It means the problem is workload routing, not total account capacity. You have spare compute; it is just on the wrong warehouse. Either move some of the queued workload to an idle warehouse, or, if the workloads genuinely belong together, scale the busy one. Check Active Warehouses and Credits by Warehouse (7d) to see where the spare capacity and the cost sit.
Why does the card show queueing when Snowsight’s warehouse chart looks flat?
Two common reasons: ACCOUNT_USAGE latency (the chart and the card may be reading slightly different time slices), and the difference between per-query queue time in QUERY_HISTORY and the sampled load in WAREHOUSE_LOAD_HISTORY. Cross-check with the WAREHOUSE_LOAD_HISTORY query above over the same hour to confirm the underlying signal before assuming a discrepancy.