At a glance
The DBU burned over the last 24 hours by clusters that were running but had no active job or query on them. This is pure waste: compute you paid for that did no work. The most common cause is auto-termination set too generously (or off), so an all-purpose cluster sits warm for an hour after the last analyst logs off, or a job cluster lingers after its run completes. The card is the single best signal for auto-termination tuning, and the alert fires when idle waste crosses a tenth of total spend, the point at which it stops being rounding error and starts being a line item.
| Data source | Databricks billable usage (system.billing.usage for the DBU figure) cross-referenced against cluster activity (the Jobs runs API and SQL query history / cluster event timeline) to determine, for each billed interval, whether any workload was actually running on that cluster. |
| What it counts | DBU charged while a cluster was in RUNNING state but had zero active job runs and zero active interactive commands or queries for that interval. The result is summed across all clusters over the 24h window. |
| What does NOT count | DBU burned while a job or query was executing (that is productive spend); DBU burned during cluster start-up / spin-down (unavoidable overhead, reported separately); terminated clusters (they burn nothing); and serverless compute, which auto-scales to zero and so cannot be “idle” in this sense. |
| Idle definition | An interval is idle if no Spark job is active and no command has executed on the cluster within the interval. A cluster waiting out its auto-termination countdown is the canonical idle case. |
| Aggregation window | Rolling 24 hours. The headline is total idle DBU; the alert compares it to total DBU for the same window. |
| Time window | 24h (rolling 24 hours) |
| Alert trigger | > 10% of total. When idle DBU exceeds 10% of total DBU burned in the window, the card flags it as a tuning opportunity. |
| Roles | owner, platform engineering, finance / FinOps |
Calculation
For each cluster and each billed interval in the last 24 hours, Vortex IQ decides whether the interval was idle (cluster running, no workload active) and, if so, adds that interval’s DBU to the waste total:- Idle is the absence of work, not low utilisation. A cluster running a small query at 5% CPU is busy, not idle, and its DBU is productive spend (right-sizing is a different conversation, handled by Avg Cluster CPU Utilisation %). This card counts only intervals with literally no workload.
- Start-up and shutdown overhead is excluded. Every cluster pays a few minutes of DBU to acquire instances and initialise Spark, and a moment to tear down. That is unavoidable and is not waste; the card carves it out so a workload that spins clusters up and down frequently is not unfairly penalised.
- The alert is a ratio, not an absolute. Ten DBU of idle on a 2,000 DBU day is noise; ten DBU on an 80 DBU day is 12.5% and worth acting on. Expressing waste as a share of total keeps the signal meaningful for both large and small workspaces.
Worked example
A platform team supports a shared analytics workspace plus a set of scheduled production jobs. Snapshot taken 17 Apr 26 at 09:00, covering the previous 24 hours.| Cluster | Total DBU (24h) | Idle DBU | Idle share | Auto-term setting |
|---|---|---|---|---|
analyst-shared-ap | 560 | 188 | 34% | 120 min |
prod-etl-nightly | 610 | 22 | 4% | 10 min |
ml-sandbox-ap | 140 | 96 | 69% | off |
prod-etl-hourly | 150 | 14 | 9% | 10 min |
| Workspace total | 1,460 | 320 | 22% | mixed |
ml-sandbox-apis the worst offender at 69% idle with auto-termination off. A data scientist spun it up to prototype, ran a few cells, and left it running all night. Two thirds of its spend was paid for nothing. The fix is immediate and high-ROI: turn on auto-termination at 30 minutes via a cluster policy so sandbox clusters cannot be left running indefinitely. Estimated saving: roughly 90 DBU/day, every day.analyst-shared-apis the biggest absolute waste at 188 DBU, driven by a 120-minute timeout. Fourteen analysts share it, so it is genuinely useful during the day, but the two-hour idle window means it sits warm long after the last person leaves. Dropping the timeout to 30 minutes would recover most of the 188 DBU without affecting working hours, because nobody needs a cluster to stay warm for two hours of inactivity.- The two production ETL clusters are healthy (4% and 9% idle). Their tight 10-minute auto-termination keeps idle near the unavoidable start-up overhead. They are the template: apply the same policy to the interactive clusters.
Sibling cards to read alongside
| Card | Why pair it with Idle Cluster DBU Wasted | What the combination tells you |
|---|---|---|
| DBU Burned (24h) | The denominator the 10% alert is measured against. | Idle DBU is only actionable as a share of total; read them together. |
| DBU by Cluster (7d) | Identifies which clusters spend the most over the week. | A top spender that is also mostly idle is the highest-ROI fix. |
| Avg Cluster CPU Utilisation % | Distinguishes idle (no work) from under-used (some work, oversized). | Low CPU but not idle equals right-sizing; idle equals auto-termination. |
| Active Clusters | How many clusters are live right now. | A high live count overnight is a leading indicator of idle waste. |
| Avg DBU per Job Run | Per-run efficiency for the job clusters. | Idle job clusters inflate per-run DBU even when the job itself is efficient. |
| DBU Burn +50% Week-over-Week | The anomaly alert idle waste can quietly feed. | A creeping idle share can push total DBU into the WoW anomaly band. |
| DBU Burn vs Ecom Order Volume | The cross-channel efficiency check. | Idle waste is spend that grows DBU without serving any order at all. |
Reconciling against the source
Where to look in Databricks:Compute → cluster → Event log showsAn approximate reconciling query (idle DBU is inferred where no job/query overlaps the usage interval):TERMINATINGreasons (INACTIVITYis the auto-termination event) and lets you see how long a cluster idled before it shut down. System tables: joinsystem.billing.usage(DBU per interval) againstsystem.compute.clusters/ the cluster event timeline to attribute DBU to intervals with no active run. This is the source the card reconstructs. Cluster settings: theautotermination_minutesfield per cluster is the lever that drives this number.
| Reason | Direction | Why |
|---|---|---|
| Start-up overhead handling | Vortex IQ may read lower | The card excludes spin-up / spin-down DBU as unavoidable; a naive “any DBU with no job” calculation would count it as idle. |
| Interactive command detection | Vortex IQ may read lower | A notebook command keeps a cluster busy even if no Jobs run exists. The card reads command/query activity, not just the Jobs API; a Jobs-only check would overstate idle. |
| Billing-interval granularity | Small drift | Usage rows are bucketed; a cluster that goes idle mid-interval is apportioned, which can differ slightly from the exact second of last activity. |
| System-table lag | Vortex IQ live, table delayed | The latest hour can still be settling in system.billing.usage, so a same-minute manual query may read lower than the card. |
| Serverless excluded | Not comparable | Serverless scales to zero, so it never idles; do not expect serverless warehouses in this figure. |
| Card | Expected relationship | What causes divergence |
|---|---|---|
| DBU Burned (24h) | Idle DBU is always a subset of total DBU; the ratio is the alert. | If idle exceeds productive spend, auto-termination is effectively off across the board. |
| DBU Burn vs Ecom Order Volume | Recovering idle waste should lower DBU without lowering order volume. | If trimming idle hurts a workload, it was not truly idle; recheck the activity signal. |