At a glance
Slow-Query Rate % is the share of SQL statements on your Databricks SQL warehouses that exceed a “slow” duration threshold, measured over a rolling 15-minute window. Where the latency-percentile cards tell you how slow the tail is, this card tells you how broad the slowness is. A p99 spike caused by one monster query barely moves the slow-query rate; a warehouse-wide slowdown pushes it up sharply. It is the single best “is this affecting lots of people or just one query?” signal in the Performance category.
| What it tracks | The percentage of completed SQL statements classified as slow (duration above the slow threshold) over the window, for the warehouses in scope. |
| Data source | Databricks SQL query history (system.query.history / Query History API): count of statements over the slow threshold divided by total completed statements. |
| Time window | 15m (rolling 15-minute window) |
| Alert trigger | > 5%. When more than 5% of queries are slow over the window, the on-call data engineer is notified. |
| Roles | owner, engineering |
| Card class | Hero and Sensitivity card: it drives the Performance health signal and both the slow threshold and the 5% alert level are configurable in the Sensitivity tab. |
Calculation
Over the rolling 15-minute window, Vortex IQ reads completed statements from the warehouse query history, counts how many had a total duration above the configured “slow” threshold, and divides by the total number of completed statements:Worked example
An online grocer runs a sharedServerless Small SQL warehouse serving operational dashboards across merchandising, supply chain, and finance. The slow threshold is set to 5 seconds. Snapshot taken on 28 May 26 at 16:20 BST.
| Reading | Value |
|---|---|
| Total queries in window | 1,640 |
| Queries over 5s | 138 |
| Slow-Query Rate | 8.4% (alert: above 5%) |
| p50 latency | 1,900ms |
| p95 latency | 7,100ms |
| Warehouse saturation | 84% |
- The slowness is broad, not a single offender. 138 of 1,640 queries are slow. That is not one bad query; a meaningful chunk of the whole workload is degraded. p50 has risen to 1.9s (the typical query is now slow-ish too), confirming the problem is system-wide.
- Saturation at 84% points to the cause. The warehouse is near capacity. With many teams hitting the same small warehouse at 16:20 (end-of-day reporting crunch), queries queue and a growing fraction tip over the 5-second line.
- The lever is capacity allocation. Because the cause is load on a shared warehouse, the fix is structural: enable multi-cluster auto-scaling so the warehouse adds a cluster during the end-of-day crunch, or split the heaviest team (finance’s large aggregations) onto its own warehouse so it stops crowding out lightweight merchandising dashboards.
- Slow-Query Rate measures breadth; percentiles measure depth. A high rate means many users are affected. Always read it alongside SQL Query Latency p95 (ms) and SQL Query Latency p99 (ms) to know both how widespread and how severe the slowness is.
- High rate plus high saturation equals a capacity problem. High rate with low saturation, by contrast, points to degraded table layout or missing data pruning, which is a query/data fix, not a scaling one.
- The threshold is two numbers, set both. The slow-duration threshold defines “slow” for your workload, and the 5% alert defines your tolerance. A warehouse of heavy aggregations may use a higher slow threshold; a latency-sensitive BI warehouse may use a tighter one.
Sibling cards
| Card | Why pair it with Slow-Query Rate | What the combination tells you |
|---|---|---|
| SQL Query Latency p95 (ms) | The severity of the tail. | High rate plus high p95 equals broad and deep slowness; high rate alone means many queries just over the line. |
| SQL Query Latency p99 (ms) | The extreme tail. | Low rate plus high p99 equals one or two pathological queries, not a broad problem. |
| SQL Query Latency p50 (ms) | The median baseline. | A rising p50 alongside the rate confirms even typical queries are now slow. |
| SQL Warehouse Saturation % | The capacity cause. | High rate plus high saturation equals overload; high rate plus low saturation equals table-layout/query problems. |
| Avg Cluster CPU Utilisation % | The compute-pressure peer. | Confirms whether the warehouse is CPU-bound during the slow window. |
| Top 10 Slowest SQL Queries | The named offenders. | Identifies the statements making up the slow fraction so you can rewrite or reschedule them. |
| SQL Query Error Rate % | The failure peer. | A rising error rate alongside slow-query rate means queries are starting to time out, not just run slowly. |
| Slow SQL Queries During Checkout Window | The revenue cross-channel view. | Tells you whether the slow fraction overlaps live checkout traffic. |
Reconciling against the source
Where to look in Databricks:
Query History in the Databricks SQL workspace, filtered to the same warehouse and 15-minute range: count the statements with duration above your slow threshold against the total to reproduce the rate.
system.query.history (Unity Catalog system tables) is the exact source; a single query reproduces the card.
Warehouse monitoring on the warehouse page shows live queue depth and cluster count, which explain a load-driven rate spike.
To match the card precisely:
5000 with whatever slow threshold you have configured in the Sensitivity tab.)
Why our number may legitimately differ from the Databricks UI:
| Reason | Direction | Why |
|---|---|---|
| Slow-threshold definition | Variable | The rate depends entirely on your configured slow threshold; if you compare against a different cut-off in the UI, the percentages will not match. |
| Duration definition | Vortex IQ may read more queries as slow | We use total duration including queue wait; an execution-time-only comparison classifies fewer queries as slow. |
| System-table latency | Brief lag | system.query.history can lag completion by a few seconds, so the most recent statements may be missing from a live reading. |
| Denominator scope | Variable | We divide by completed statements; if you include cancelled/failed statements or metadata-only commands, the denominator (and the rate) shift. |
| Time zone / window edges | Marginal | Vortex IQ aligns the 15-minute window to your reporting time zone. |
| Card | Expected relationship | What causes divergence |
|---|---|---|
shopify.total_revenue / bigcommerce.total_revenue | A broad slow-query rate spike during peak browsing can correspond to degraded storefront features if the lakehouse feeds them synchronously. | Revenue steady during a rate spike means the slowness is internal-only (BI/reporting), not customer-facing. |
google_analytics | Independent front-end timing measurement. | Lakehouse rate high but GA4 timings normal equals back-office-only impact. |