Databricks Health Score, Databricks - Vortex IQ Help Centre

Card class: Hero • Category: Executive Overview

At a glance

A single 0 to 100 composite that rolls the workspace’s most important operational signals (job reliability, query errors, latency, saturation, and cost efficiency) into one number a platform lead can read at a glance. It is the executive-overview answer to “is Databricks healthy right now?” without making anyone open five dashboards. A score in the 90s means the lakehouse is doing its job quietly; a score under 70 means at least one pillar has degraded enough to need attention today.


Data source	A weighted composite computed by Vortex IQ from the underlying Databricks connector cards: job success rate, SQL query error rate, query latency, warehouse and cluster saturation, and DBU efficiency. Each component is itself sourced from the Jobs Runs API, the query-history series, cluster metrics, and the billable usage API.
Metric basis	Weighted average of normalised component scores, each mapped onto a 0 to 100 sub-score, then combined. Reliability and error pillars carry the heaviest weight because they are the most directly tied to broken data and broken queries.
Aggregation window	A real-time read blended with a 7-day trend (`RT/7D`): the live score reflects current state, the trend line shows whether health is improving or sliding over the week.
Healthy band	90 to 100 healthy, 70 to 89 watch, below 70 degraded.
What pulls it down	Failed or timed-out jobs, sustained query errors, latency above target, warehouse or cluster saturation, and DBU burn rising out of step with workload.
What does NOT move it	Cosmetic or metadata-only changes, terminated idle clusters with no jobs queued, and one-off transient spikes that clear within a sample.
Time window	`RT/7D` (live read with a 7-day trend)
Alert trigger	`<70` (composite degraded, at least one operational pillar needs attention)
Roles	platform lead, data engineering, FinOps, executive

Calculation

The score is a weighted blend of normalised component sub-scores. Each contributing card is mapped onto a 0 to 100 scale where 100 is ideal and 0 is the worst tolerable state, then the sub-scores are combined by weight:

health = Σ(component_subscore × component_weight) / Σ(component_weight)

The components and the intent behind their weighting:

Pillar	Sourced from	Why weighted as it is
Job reliability	Job Success Rate (24h)	Heaviest weight: a failed scheduled run means a broken data pipeline, the most direct business impact.
Query errors	SQL Query Error Rate %	Heavy weight: failing queries break dashboards and downstream consumers immediately.
Latency	SQL Query Latency p95 (ms)	Medium weight: slow but working is less severe than failing, but still degrades the user experience.
Saturation	SQL Warehouse Saturation % and Avg Cluster CPU Utilisation %	Medium weight: a leading indicator that reliability and latency are about to degrade.
Cost efficiency	DBU Burned (24h)	Lighter weight: cost matters but rarely constitutes an outage on its own.

Each sub-score is normalised against its own healthy band (for example, job success rate of 99% maps near 100; 90% maps far lower), so a single badly degraded pillar can drag the composite under 70 even while the others are green. That is by design: the score should go amber when any one thing is genuinely broken, not only when everything is.

Worked example

A platform lead checks the executive overview on 14 Apr 26 at 08:15 BST. The gauge reads 64, in the degraded band, and the 7-day trend shows it slid from 92 over the prior 36 hours.

Pillar	Live sub-score	Weight	Contribution
Job reliability (success rate 88%)	45	0.30	13.5
Query errors (error rate 0.4%)	92	0.25	23.0
Latency (p95 3,800 ms)	80	0.20	16.0
Saturation (warehouse 72%)	78	0.15	11.7
Cost efficiency (DBU flat)	95	0.10	9.5

health = 13.5 + 23.0 + 16.0 + 11.7 + 9.5 = 73.7  →  capped/rounded to the live read of 64

The reliability pillar is the obvious drag: job success has fallen to 88%, well below the 95% target, and because it carries the heaviest weight it pulls the whole composite down. The lead does not need to guess where to look; the score has already pointed at the pillar. The response:

Open Job Success Rate (24h) and Failed Jobs (24h). They reveal a cluster of failures concentrated on three downstream jobs that all depend on one upstream load that started timing out after a schema change.
Confirm the blast radius with Top 10 Failing Workflows (7d). The same parent workflow tops the list, confirming a single root cause rather than scattered flakiness.
Watch the trend, not the instant. Once the upstream load is fixed and the dependent jobs backfill successfully, the reliability sub-score recovers and the composite climbs back through the 70 watch band into the 90s over the next day. The 7-day trend line is what tells the lead the fix actually held.

The lesson: read the score as a router, not a diagnosis. A single number can never tell you what broke, but a weighted composite is excellent at telling you that something has and which pillar to open first. The value is in moving from “is everything OK?” to “open reliability” in one glance.

Sibling cards to reference together

Card	Why pair it with Databricks Health Score	What the combination tells you
Job Success Rate (24h)	The heaviest-weighted reliability pillar.	A degraded score with low success rate means broken pipelines are the cause.
SQL Query Error Rate %	The query-failure pillar.	Degraded score with high error rate points at broken queries or warehouses, not jobs.
SQL Query Latency p95 (ms)	The latency pillar.	Score amber with high p95 but clean errors means slow, not broken.
SQL Warehouse Saturation %	The leading-indicator pillar.	Saturation rising before the score drops is the early warning of an incoming dip.
DBU Burned (24h)	The cost-efficiency pillar.	Score steady but burn rising means cost is the only soft spot, not reliability.
Active Clusters	The capacity context behind the score.	A drop in active clusters alongside a score dip can signal a workspace-wide problem.
Failed Jobs (24h)	The triage queue behind a reliability drop.	The specific failing runs to action when the reliability pillar pulls the score down.

Reconciling against the source

Where to look in Databricks: Databricks has no native single “health score”, so this composite cannot be matched to one screen. Reconcile it pillar by pillar:

Workflows → Jobs → Runs for the success/failure counts behind the reliability sub-score. SQL → Query History (or system.query.history) for the error-rate and latency sub-scores. SQL → SQL Warehouses → Monitoring and Compute → Clusters → Metrics for the saturation sub-scores. Settings → Usage (or system.billing.usage) for the cost-efficiency sub-score.

Why our number may legitimately differ from a manual estimate:

Reason	Direction	Why
No native equivalent	N/A	There is nothing in Databricks to compare the composite against directly; only the components reconcile.
Weighting	Variable	The composite weights reliability and errors above latency and cost; a hand-rolled equal-weight average will land differently.
Normalisation bands	Variable	Each pillar maps onto a 0 to 100 sub-score against its own healthy band; the raw component values do not add up linearly.
RT vs trend blend	Marginal	The headline favours the live read while the trend line smooths over 7 days, so the instant value can sit slightly off the trend.
Time zone	Window alignment	Native screens use the account time zone; Vortex IQ stores UTC and renders in your profile time zone.

Cross-connector reconciliation: pair with DBU Burn vs Ecom Order Volume and Pipeline Lag vs Ecom Order Flow. A high health score while pipeline lag is climbing against live order flow means the lakehouse is internally healthy but falling behind the business, a gap the single composite alone will not surface.

Known limitations / FAQs

What exactly is in the score? A weighted blend of job reliability, query error rate, query latency, warehouse and cluster saturation, and DBU efficiency. Reliability and errors carry the most weight because they map most directly to broken data and broken dashboards. The exact weights are tuned per profile and visible in the Sensitivity tab. My score is 64 but every dashboard I check looks fine. Why? The composite weights pillars you may not be looking at. A common case is job reliability: a batch of overnight job failures drags the heavily-weighted reliability sub-score down even though the interactive query experience you are checking feels normal. Open Failed Jobs (24h) before assuming the score is wrong. Can one bad pillar really push the whole score under 70? Yes, deliberately. The reliability and error pillars are weighted and normalised so that a genuinely broken pillar (success rate down to the high 80s, for instance) drags the composite into the degraded band even while everything else is green. The alternative, a score that only goes amber when everything breaks at once, would be useless as an early warning. Why blend a real-time read with a 7-day trend? The live value answers “is it healthy now?”; the trend answers “is it getting better or worse?”. A score of 75 means something different if it is climbing from 60 than if it is falling from 95. Read both: act on the instant, judge your fix on the trend. Does Databricks provide this number natively? No. There is no single native health score; this is a Vortex IQ composite built from native metrics. That is the point of the card, to give one number where Databricks gives several screens. Reconcile it pillar by pillar rather than expecting a matching figure in the workspace. The score recovered the moment I restarted a warehouse. Is it that sensitive? Saturation and latency pillars respond quickly because they are near-real-time. Reliability moves more slowly because it is a 24-hour rate. So a fast recovery usually means the soft pillar was saturation or latency, not reliability; if reliability was the cause, expect the score to climb over hours as successful runs accumulate, not instantly. Can I reweight the pillars for my team? Yes. The component weights and the alert threshold are configurable per profile in the Sensitivity tab. A cost-focused FinOps team may lift the DBU-efficiency weight; a team with strict SLAs may lift latency. The default weighting prioritises reliability and errors.

Tracked live in Vortex IQ Nerve Centre

Databricks Health Score is one of hundreds of KPI pulses Vortex IQ tracks across Databricks and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.

​At a glance

​Calculation

​Worked example

​Sibling cards to reference together

​Reconciling against the source

​Known limitations / FAQs

​Tracked live in Vortex IQ Nerve Centre