At a glance
A single 0 to 100 composite that rolls the workspace’s most important operational signals (job reliability, query errors, latency, saturation, and cost efficiency) into one number a platform lead can read at a glance. It is the executive-overview answer to “is Databricks healthy right now?” without making anyone open five dashboards. A score in the 90s means the lakehouse is doing its job quietly; a score under 70 means at least one pillar has degraded enough to need attention today.
| Data source | A weighted composite computed by Vortex IQ from the underlying Databricks connector cards: job success rate, SQL query error rate, query latency, warehouse and cluster saturation, and DBU efficiency. Each component is itself sourced from the Jobs Runs API, the query-history series, cluster metrics, and the billable usage API. |
| Metric basis | Weighted average of normalised component scores, each mapped onto a 0 to 100 sub-score, then combined. Reliability and error pillars carry the heaviest weight because they are the most directly tied to broken data and broken queries. |
| Aggregation window | A real-time read blended with a 7-day trend (RT/7D): the live score reflects current state, the trend line shows whether health is improving or sliding over the week. |
| Healthy band | 90 to 100 healthy, 70 to 89 watch, below 70 degraded. |
| What pulls it down | Failed or timed-out jobs, sustained query errors, latency above target, warehouse or cluster saturation, and DBU burn rising out of step with workload. |
| What does NOT move it | Cosmetic or metadata-only changes, terminated idle clusters with no jobs queued, and one-off transient spikes that clear within a sample. |
| Time window | RT/7D (live read with a 7-day trend) |
| Alert trigger | <70 (composite degraded, at least one operational pillar needs attention) |
| Roles | platform lead, data engineering, FinOps, executive |
Calculation
The score is a weighted blend of normalised component sub-scores. Each contributing card is mapped onto a 0 to 100 scale where 100 is ideal and 0 is the worst tolerable state, then the sub-scores are combined by weight:| Pillar | Sourced from | Why weighted as it is |
|---|---|---|
| Job reliability | Job Success Rate (24h) | Heaviest weight: a failed scheduled run means a broken data pipeline, the most direct business impact. |
| Query errors | SQL Query Error Rate % | Heavy weight: failing queries break dashboards and downstream consumers immediately. |
| Latency | SQL Query Latency p95 (ms) | Medium weight: slow but working is less severe than failing, but still degrades the user experience. |
| Saturation | SQL Warehouse Saturation % and Avg Cluster CPU Utilisation % | Medium weight: a leading indicator that reliability and latency are about to degrade. |
| Cost efficiency | DBU Burned (24h) | Lighter weight: cost matters but rarely constitutes an outage on its own. |
Worked example
A platform lead checks the executive overview on 14 Apr 26 at 08:15 BST. The gauge reads 64, in the degraded band, and the 7-day trend shows it slid from 92 over the prior 36 hours.| Pillar | Live sub-score | Weight | Contribution |
|---|---|---|---|
| Job reliability (success rate 88%) | 45 | 0.30 | 13.5 |
| Query errors (error rate 0.4%) | 92 | 0.25 | 23.0 |
| Latency (p95 3,800 ms) | 80 | 0.20 | 16.0 |
| Saturation (warehouse 72%) | 78 | 0.15 | 11.7 |
| Cost efficiency (DBU flat) | 95 | 0.10 | 9.5 |
- Open Job Success Rate (24h) and Failed Jobs (24h). They reveal a cluster of failures concentrated on three downstream jobs that all depend on one upstream load that started timing out after a schema change.
- Confirm the blast radius with Top 10 Failing Workflows (7d). The same parent workflow tops the list, confirming a single root cause rather than scattered flakiness.
- Watch the trend, not the instant. Once the upstream load is fixed and the dependent jobs backfill successfully, the reliability sub-score recovers and the composite climbs back through the 70 watch band into the 90s over the next day. The 7-day trend line is what tells the lead the fix actually held.
Sibling cards to reference together
| Card | Why pair it with Databricks Health Score | What the combination tells you |
|---|---|---|
| Job Success Rate (24h) | The heaviest-weighted reliability pillar. | A degraded score with low success rate means broken pipelines are the cause. |
| SQL Query Error Rate % | The query-failure pillar. | Degraded score with high error rate points at broken queries or warehouses, not jobs. |
| SQL Query Latency p95 (ms) | The latency pillar. | Score amber with high p95 but clean errors means slow, not broken. |
| SQL Warehouse Saturation % | The leading-indicator pillar. | Saturation rising before the score drops is the early warning of an incoming dip. |
| DBU Burned (24h) | The cost-efficiency pillar. | Score steady but burn rising means cost is the only soft spot, not reliability. |
| Active Clusters | The capacity context behind the score. | A drop in active clusters alongside a score dip can signal a workspace-wide problem. |
| Failed Jobs (24h) | The triage queue behind a reliability drop. | The specific failing runs to action when the reliability pillar pulls the score down. |
Reconciling against the source
Where to look in Databricks: Databricks has no native single “health score”, so this composite cannot be matched to one screen. Reconcile it pillar by pillar:Workflows → Jobs → Runs for the success/failure counts behind the reliability sub-score. SQL → Query History (orWhy our number may legitimately differ from a manual estimate:system.query.history) for the error-rate and latency sub-scores. SQL → SQL Warehouses → Monitoring and Compute → Clusters → Metrics for the saturation sub-scores. Settings → Usage (orsystem.billing.usage) for the cost-efficiency sub-score.
| Reason | Direction | Why |
|---|---|---|
| No native equivalent | N/A | There is nothing in Databricks to compare the composite against directly; only the components reconcile. |
| Weighting | Variable | The composite weights reliability and errors above latency and cost; a hand-rolled equal-weight average will land differently. |
| Normalisation bands | Variable | Each pillar maps onto a 0 to 100 sub-score against its own healthy band; the raw component values do not add up linearly. |
| RT vs trend blend | Marginal | The headline favours the live read while the trend line smooths over 7 days, so the instant value can sit slightly off the trend. |
| Time zone | Window alignment | Native screens use the account time zone; Vortex IQ stores UTC and renders in your profile time zone. |