At a glance
An anomaly alert that fires when total DBU consumption for the trailing 7 days is more than 50% above the prior 7 days, without a proportional rise in the underlying workload. DBUs are the unit Databricks bills on, so a 50% week-over-week jump is, directly, a 50% jump in the compute bill. The “without proportional workload” qualifier is what makes this an anomaly rather than just a busy week: if order volume, query volume, and job count all rose 50% too, the spend is justified. If they did not, something is burning money inefficiently.
| Data source | Databricks billable usage data from the system.billing.usage system table (the same data behind the account-level Usage dashboard), aggregated to total DBUs per 7-day window. Workload comparators come from system.lakeflow.job_run_timeline (job runs) and system.query.history (SQL volume). |
| Metric basis | A week-over-week ratio: this_7d_DBU / prior_7d_DBU. The alert is an anomaly verdict, not a raw number, it fires only when the DBU rise outpaces the workload rise. |
| Aggregation window | 7d trailing versus the immediately prior 7d, recomputed daily as billable usage settles. |
| Alert trigger | +50% DBU WoW without proportional workload. A +50% rise that is matched by a +50% rise in jobs or query volume is suppressed as “expected growth”. |
| What “proportional workload” means | The engine compares DBU growth against job-run count growth and SQL query-volume growth. If DBU growth exceeds workload growth by a meaningful margin, the rise is flagged as inefficiency rather than genuine demand. |
| What does NOT trigger it | (1) A spend rise fully explained by more jobs, more queries, or more orders; (2) a one-off backfill that the team has annotated; (3) intra-week spikes that wash out over the 7-day average. |
| Time zone | Workspace time zone for window boundaries; UTC for cross-connector alignment. |
| Time window | 7d versus prior 7d. |
| Roles | owner, platform engineering, finance / FinOps |
Calculation
The alert evaluates two ratios and compares them:system.billing.usage settles over a few hours, so the comparison is recomputed daily rather than minute-by-minute. The alert is intentionally a weekly signal: it is the FinOps tripwire, not the real-time one. For real-time cost runaway, DBU Burned (24h) and Long-Running Jobs (>1h) move faster.
Worked example
A data platform team supports an ecommerce analytics estate. The FinOps lead reviews this card every Monday morning. Snapshot taken on 11 May 26.| Window | Total DBU | Job runs | SQL queries | Orders processed |
|---|---|---|---|---|
| Prior 7d (28 Apr to 04 May) | 9,200 | 4,100 | 612,000 | 88,000 |
| This 7d (05 May to 11 May) | 14,400 | 4,250 | 631,000 | 90,500 |
- Attribute the spend. DBU by Cluster (7d) shows the extra 5,200 DBU is concentrated in
analytics-shared, an all-purpose cluster, not the job clusters. That immediately rules out “the pipelines got heavier”. - Check for idle waste. Idle Cluster DBU Wasted (24h) confirms
analytics-sharedran 24/7 all week. Someone changed its auto-termination from 30 minutes to “never” during a debugging session and forgot to revert it. - Quantify the bill. 5,200 extra DBU at an illustrative blended 2,860 for one week, or roughly $149,000/year if left unchecked.
- Fix and annotate. Auto-termination is restored to 30 minutes. The lead annotates the week so the following Monday’s comparison (which will show a 35% drop back to baseline) is understood as a fix, not a new anomaly.
Sibling cards
| Card | Why pair it with DBU Burn +50% WoW | What the combination tells you |
|---|---|---|
| DBU Burned (24h) | The faster, real-time cost pulse. | A 24h spike that compounds into the weekly anomaly tells you the runaway started recently. |
| DBU by Cluster (7d) | Attributes the extra spend to a specific cluster. | Pinpoints which cluster caused the week-over-week jump. |
| Idle Cluster DBU Wasted (24h) | The most common root cause: clusters alive with no work. | High idle DBU is usually the explanation for a decoupled rise. |
| Avg DBU per Job Run | Detects per-job efficiency regressions. | If cost per job rose, an individual job got heavier, not the schedule. |
| Avg Cluster CPU Utilisation % | Shows whether the extra compute was used. | High DBU with low CPU equals waste; high DBU with high CPU equals genuine load. |
| DBU Burn vs Ecom Order Volume | The cross-channel view of the same decoupling. | Spend up while orders flat is the business-level confirmation of inefficiency. |
| Active Clusters | The simple count of live compute. | A rising count alongside the alert points at clusters not terminating. |
Reconciling against the source
Where to look in Databricks:Account console → Usage dashboard for the canonical DBU spend by day, SKU, and workspace. This is the authoritative source the alert is built on.Why our verdict may legitimately differ from the Usage dashboard:system.billing.usagesystem table in Unity Catalog: query DBU per day directly to reproduce the 7-day windows exactly.system.lakeflow.job_run_timelineandsystem.query.historyfor the workload comparators (job runs and SQL volume) that determine the “proportional workload” verdict.
| Reason | Direction | Why |
|---|---|---|
| Usage settlement lag | Brief | system.billing.usage settles over a few hours; the latest day in the window may still be filling in. Vortex IQ recomputes daily as it settles. |
| Window definition | Variable | This card uses trailing 7d vs prior 7d; the Usage dashboard defaults to calendar months. Match the date range to reconcile. |
| SKU inclusion | Vortex IQ broader | The card totals all DBU SKUs (jobs, SQL, all-purpose). If you filter the Usage dashboard to one SKU you will see a smaller figure. |
| Workload qualifier | Vortex IQ may stay quiet | The Usage dashboard shows raw spend with no anomaly logic; it has no notion of “proportional workload”, so it cannot suppress a legitimate growth week the way this card does. |
| Workspace scope | Variable | This card scopes to the connected workspace; the account Usage dashboard can aggregate across all workspaces. |
| Card | Expected relationship | What causes divergence |
|---|---|---|
| DBU Burn vs Ecom Order Volume | When this alert fires for inefficiency, order volume will be flat. | If orders also rose 50%, the spend was demand-driven and this card should not have fired; check the workload comparator config. |
| Shopify / BigCommerce / Adobe Total Revenue | A sale week lifts both revenue and DBU together. | Revenue flat but DBU up 50% is the business-level signature of the inefficiency this alert catches. |