At a glance
SQL Query Latency p95 (ms) is the 95th-percentile completion time for SQL statements on your Databricks SQL warehouses: 95% of queries finish faster than this, 5% take longer. For a lakehouse this is the number your dashboards and downstream consumers actually feel, because the slow tail is where timeouts, abandoned BI refreshes, and stalled checkout-analytics joins live. Lakehouse SQL p95 is typically measured in seconds rather than milliseconds; the alert threshold (5000ms) reflects a busy warehouse workload, not a single trivial query.
| What it tracks | The 95th-percentile total query duration on your Databricks SQL warehouses over the window. Lakehouse SQL p95 measured in seconds typically; the threshold reflects warehouse workload. |
| Data source | Databricks SQL query history (system.query.history / Query History API), durations aggregated to the 95th percentile. |
| Time window | RT/5m (real-time, evaluated over a rolling 5-minute window) |
| Alert trigger | > 5000ms. When the trailing-window p95 exceeds 5 seconds, the tail of your workload is degrading and the on-call data engineer is notified. |
| Roles | owner, engineering |
| Card class | Hero and Sensitivity card: it drives the Performance health signal and is configurable in the Sensitivity tab. |
Calculation
Vortex IQ reads completed statements from your Databricks SQL warehouse query history over the rolling 5-minute window and computes the 95th percentile of total query duration. “Total duration” is the full wall-clock time from submission to result availability, which includes queue wait, compilation/planning, photon/execution time, and result fetch. It is not execution time alone, so a query that ran for 800ms but waited 6 seconds in a saturated queue counts as roughly 6.8 seconds toward the percentile. Because it is the 95th percentile, the value answers “how slow is a bad-but-not-worst-case query right now?”. It deliberately ignores the absolute worst 5% (those belong to SQL Query Latency p99 (ms)) and ignores the median (that is SQL Query Latency p50 (ms)). The card reports the value in milliseconds for consistency across the Performance category; a reading of 4,200 means 4.2 seconds.Worked example
A retail analytics team runs aServerless Medium SQL warehouse backing both the executive Tableau dashboard and a Delta table that feeds the storefront’s “recommended for you” widget. Snapshot taken on 14 Apr 26 at 09:35 BST, just as the morning dashboard-refresh wave hits.
| Reading | Value |
|---|---|
| p50 latency | 480ms |
| p95 latency | 6,300ms (alert: above 5,000ms) |
| p99 latency | 11,200ms |
| Warehouse saturation | 88% |
| Queued queries (peak) | 14 |
- Saturation at 88% plus 14 queued queries explains the tail. During the 09:30 refresh wave, more queries arrive than the warehouse has slots for. Fast queries that land in an empty slot stay near 480ms (keeping p50 low), but queries that arrive during the burst wait several seconds before execution, inflating p95.
- The business impact is concrete. The “recommended for you” widget query is in the slow 5%. When its total duration crosses ~5 seconds the storefront falls back to a generic placeholder, so a slow lakehouse tail is quietly degrading on-site personalisation during peak browsing.
- The fix is right-sizing, not query rewriting. Because p50 is fine, rewriting individual queries will not help much. The lever is concurrency: enable or raise multi-cluster auto-scaling on the warehouse so the 09:30 burst spins up a second cluster, or move the storefront-feeding query to its own dedicated warehouse so dashboard traffic cannot queue behind it.
- Always read p95 next to p50. Low p50 with high p95 means a tail/queueing problem (scale concurrency). High p50 and high p95 means the whole warehouse is slow (scale size or fix table layout).
- p95 in seconds is normal for a lakehouse. Unlike a row-store OLTP database, Databricks SQL handles scan-heavy analytical queries; a multi-second p95 is acceptable for heavy aggregations. Tune the Sensitivity threshold to your real workload rather than expecting sub-second tails.
- Pair p95 with saturation to choose the lever. Saturation high equals scale concurrency; saturation low but p95 high equals optimise table layout (run OPTIMIZE / Z-ORDER, check file sizes).
Sibling cards
| Card | Why pair it with p95 Latency | What the combination tells you |
|---|---|---|
| SQL Query Latency p50 (ms) | The median baseline. | Low p50 with high p95 equals a tail/queue problem; high in both equals a warehouse-wide slowdown. |
| SQL Query Latency p99 (ms) | The extreme tail. | If p99 dwarfs p95, a few pathological queries (skew, missing pruning) are the cause, not general load. |
| Slow-Query Rate % | The count-based view of the same tail. | Rising slow-query rate confirms p95 breach is broad, not one outlier. |
| SQL Warehouse Saturation % | The queueing cause. | High saturation explains p95 inflation without any query getting genuinely slower. |
| Avg Cluster CPU Utilisation % | The compute-pressure peer. | High CPU plus high p95 equals undersized warehouse; low CPU plus high p95 equals queueing or I/O bound. |
| Top 10 Slowest SQL Queries | The named offenders. | Identifies which statements sit in the slow tail so you can rewrite or reschedule them. |
| SQL Query Error Rate % | The failure peer. | High p95 plus rising errors equals queries timing out, not just running slowly. |
| Slow SQL Queries During Checkout Window | The revenue cross-channel view. | Confirms whether the slow tail co-occurs with storefront checkout activity. |
Reconciling against the source
Where to look in Databricks:
Query History in the Databricks SQL workspace, filtered to the same warehouse and time range, sorted by duration descending, gives you the slow tail to inspect directly.
system.query.history (Unity Catalog system tables) is the system-table source; compute the percentile yourself to match the card exactly.
Warehouse monitoring on the warehouse’s own page shows the live queue depth and running-cluster count that explain a tail spike.
To match the card precisely, query the system table over the same window:
| Reason | Direction | Why |
|---|---|---|
| Duration definition | Vortex IQ may read higher | We use total duration (queue + compile + execute + fetch); the Query History UI can show execution time in some columns, which excludes queue wait. |
| System-table latency | Brief lag | system.query.history can lag completion by a few seconds, so a very recent burst may not yet be reflected. |
| Time zone / window edges | Marginal | Vortex IQ aligns the 5-minute window to your reporting time zone; the UI defaults to workspace time. |
| Warehouse scope | Variable | If the connector is scoped to specific warehouses, the card excludes others the UI shows. |
| Statement-type filtering | Slight | Metadata-only and cancelled statements may be filtered out of the percentile; the raw UI list includes them. |
| Card | Expected relationship | What causes divergence |
|---|---|---|
shopify.total_revenue / bigcommerce.total_revenue | A p95 breach during peak browsing can correspond to degraded personalisation and a small conversion dip if the lakehouse feeds storefront features. | Revenue unaffected during a breach means the slow tail is internal-only (BI dashboards), not customer-facing. |
google_analytics | Independent measurement of front-end timing. | Lakehouse p95 high but GA4 timings normal equals back-office-only impact. |