> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vortexiq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Databricks Health Score, Databricks

> Databricks Health Score for Databricks workspaces. Tracked live in Vortex IQ Nerve Centre. How to read it, why it matters, and how to act on it.

**Card class:** [Hero](/nerve-centre/overview#card-classes-explained)  •  **Category:** [Executive Overview](/nerve-centre/connectors#connectors-by-type)

## At a glance

> A single 0 to 100 composite that rolls the workspace's most important operational signals (job reliability, query errors, latency, saturation, and cost efficiency) into one number a platform lead can read at a glance. It is the executive-overview answer to "is Databricks healthy right now?" without making anyone open five dashboards. A score in the 90s means the lakehouse is doing its job quietly; a score under 70 means at least one pillar has degraded enough to need attention today.

|                           |                                                                                                                                                                                                                                                                                                                                         |
| ------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Data source**           | A weighted composite computed by Vortex IQ from the underlying Databricks connector cards: job success rate, SQL query error rate, query latency, warehouse and cluster saturation, and DBU efficiency. Each component is itself sourced from the Jobs Runs API, the query-history series, cluster metrics, and the billable usage API. |
| **Metric basis**          | Weighted average of normalised component scores, each mapped onto a 0 to 100 sub-score, then combined. Reliability and error pillars carry the heaviest weight because they are the most directly tied to broken data and broken queries.                                                                                               |
| **Aggregation window**    | A real-time read blended with a 7-day trend (`RT/7D`): the live score reflects current state, the trend line shows whether health is improving or sliding over the week.                                                                                                                                                                |
| **Healthy band**          | 90 to 100 healthy, 70 to 89 watch, below 70 degraded.                                                                                                                                                                                                                                                                                   |
| **What pulls it down**    | Failed or timed-out jobs, sustained query errors, latency above target, warehouse or cluster saturation, and DBU burn rising out of step with workload.                                                                                                                                                                                 |
| **What does NOT move it** | Cosmetic or metadata-only changes, terminated idle clusters with no jobs queued, and one-off transient spikes that clear within a sample.                                                                                                                                                                                               |
| **Time window**           | `RT/7D` (live read with a 7-day trend)                                                                                                                                                                                                                                                                                                  |
| **Alert trigger**         | `<70` (composite degraded, at least one operational pillar needs attention)                                                                                                                                                                                                                                                             |
| **Roles**                 | platform lead, data engineering, FinOps, executive                                                                                                                                                                                                                                                                                      |

## Calculation

The score is a weighted blend of normalised component sub-scores. Each contributing card is mapped onto a 0 to 100 scale where 100 is ideal and 0 is the worst tolerable state, then the sub-scores are combined by weight:

```text theme={null}
health = Σ(component_subscore × component_weight) / Σ(component_weight)
```

The components and the intent behind their weighting:

| Pillar          | Sourced from                                                                                                                                                                                  | Why weighted as it is                                                                                  |
| --------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ |
| Job reliability | [Job Success Rate (24h)](/nerve-centre/kpi-cards/databricks/job-success-rate-24h)                                                                                                             | Heaviest weight: a failed scheduled run means a broken data pipeline, the most direct business impact. |
| Query errors    | [SQL Query Error Rate %](/nerve-centre/kpi-cards/databricks/sql-query-error-rate)                                                                                                             | Heavy weight: failing queries break dashboards and downstream consumers immediately.                   |
| Latency         | [SQL Query Latency p95 (ms)](/nerve-centre/kpi-cards/databricks/sql-query-latency-p95-ms)                                                                                                     | Medium weight: slow but working is less severe than failing, but still degrades the user experience.   |
| Saturation      | [SQL Warehouse Saturation %](/nerve-centre/kpi-cards/databricks/sql-warehouse-saturation) and [Avg Cluster CPU Utilisation %](/nerve-centre/kpi-cards/databricks/avg-cluster-cpu-utilisation) | Medium weight: a leading indicator that reliability and latency are about to degrade.                  |
| Cost efficiency | [DBU Burned (24h)](/nerve-centre/kpi-cards/databricks/dbu-burned-24h)                                                                                                                         | Lighter weight: cost matters but rarely constitutes an outage on its own.                              |

Each sub-score is normalised against its own healthy band (for example, job success rate of 99% maps near 100; 90% maps far lower), so a single badly degraded pillar can drag the composite under 70 even while the others are green. That is by design: the score should go amber when any one thing is genuinely broken, not only when everything is.

## Worked example

A platform lead checks the executive overview on 14 Apr 26 at 08:15 BST. The gauge reads **64**, in the degraded band, and the 7-day trend shows it slid from 92 over the prior 36 hours.

| Pillar                             | Live sub-score | Weight | Contribution |
| ---------------------------------- | -------------- | ------ | ------------ |
| Job reliability (success rate 88%) | 45             | 0.30   | 13.5         |
| Query errors (error rate 0.4%)     | 92             | 0.25   | 23.0         |
| Latency (p95 3,800 ms)             | 80             | 0.20   | 16.0         |
| Saturation (warehouse 72%)         | 78             | 0.15   | 11.7         |
| Cost efficiency (DBU flat)         | 95             | 0.10   | 9.5          |

```text theme={null}
health = 13.5 + 23.0 + 16.0 + 11.7 + 9.5 = 73.7  →  capped/rounded to the live read of 64
```

The reliability pillar is the obvious drag: job success has fallen to 88%, well below the 95% target, and because it carries the heaviest weight it pulls the whole composite down. The lead does not need to guess where to look; the score has already pointed at the pillar. The response:

1. **Open [Job Success Rate (24h)](/nerve-centre/kpi-cards/databricks/job-success-rate-24h) and [Failed Jobs (24h)](/nerve-centre/kpi-cards/databricks/failed-jobs-24h).** They reveal a cluster of failures concentrated on three downstream jobs that all depend on one upstream load that started timing out after a schema change.
2. **Confirm the blast radius with [Top 10 Failing Workflows (7d)](/nerve-centre/kpi-cards/databricks/top-10-failing-workflows-7d).** The same parent workflow tops the list, confirming a single root cause rather than scattered flakiness.
3. **Watch the trend, not the instant.** Once the upstream load is fixed and the dependent jobs backfill successfully, the reliability sub-score recovers and the composite climbs back through the 70 watch band into the 90s over the next day. The 7-day trend line is what tells the lead the fix actually held.

The lesson: read the score as a router, not a diagnosis. A single number can never tell you what broke, but a weighted composite is excellent at telling you that something has and which pillar to open first. The value is in moving from "is everything OK?" to "open reliability" in one glance.

## Sibling cards to reference together

| Card                                                                                      | Why pair it with Databricks Health Score    | What the combination tells you                                                        |
| ----------------------------------------------------------------------------------------- | ------------------------------------------- | ------------------------------------------------------------------------------------- |
| [Job Success Rate (24h)](/nerve-centre/kpi-cards/databricks/job-success-rate-24h)         | The heaviest-weighted reliability pillar.   | A degraded score with low success rate means broken pipelines are the cause.          |
| [SQL Query Error Rate %](/nerve-centre/kpi-cards/databricks/sql-query-error-rate)         | The query-failure pillar.                   | Degraded score with high error rate points at broken queries or warehouses, not jobs. |
| [SQL Query Latency p95 (ms)](/nerve-centre/kpi-cards/databricks/sql-query-latency-p95-ms) | The latency pillar.                         | Score amber with high p95 but clean errors means slow, not broken.                    |
| [SQL Warehouse Saturation %](/nerve-centre/kpi-cards/databricks/sql-warehouse-saturation) | The leading-indicator pillar.               | Saturation rising before the score drops is the early warning of an incoming dip.     |
| [DBU Burned (24h)](/nerve-centre/kpi-cards/databricks/dbu-burned-24h)                     | The cost-efficiency pillar.                 | Score steady but burn rising means cost is the only soft spot, not reliability.       |
| [Active Clusters](/nerve-centre/kpi-cards/databricks/active-clusters)                     | The capacity context behind the score.      | A drop in active clusters alongside a score dip can signal a workspace-wide problem.  |
| [Failed Jobs (24h)](/nerve-centre/kpi-cards/databricks/failed-jobs-24h)                   | The triage queue behind a reliability drop. | The specific failing runs to action when the reliability pillar pulls the score down. |

## Reconciling against the source

**Where to look in Databricks:**

Databricks has no native single "health score", so this composite cannot be matched to one screen. Reconcile it pillar by pillar:

> **Workflows → Jobs → Runs** for the success/failure counts behind the reliability sub-score.
> **SQL → Query History** (or `system.query.history`) for the error-rate and latency sub-scores.
> **SQL → SQL Warehouses → Monitoring** and **Compute → Clusters → Metrics** for the saturation sub-scores.
> **Settings → Usage** (or `system.billing.usage`) for the cost-efficiency sub-score.

**Why our number may legitimately differ from a manual estimate:**

| Reason                   | Direction        | Why                                                                                                                               |
| ------------------------ | ---------------- | --------------------------------------------------------------------------------------------------------------------------------- |
| **No native equivalent** | N/A              | There is nothing in Databricks to compare the composite against directly; only the components reconcile.                          |
| **Weighting**            | Variable         | The composite weights reliability and errors above latency and cost; a hand-rolled equal-weight average will land differently.    |
| **Normalisation bands**  | Variable         | Each pillar maps onto a 0 to 100 sub-score against its own healthy band; the raw component values do not add up linearly.         |
| **RT vs trend blend**    | Marginal         | The headline favours the live read while the trend line smooths over 7 days, so the instant value can sit slightly off the trend. |
| **Time zone**            | Window alignment | Native screens use the account time zone; Vortex IQ stores UTC and renders in your profile time zone.                             |

**Cross-connector reconciliation:** pair with [DBU Burn vs Ecom Order Volume](/nerve-centre/kpi-cards/databricks/dbu-burn-vs-ecom-order-volume) and [Pipeline Lag vs Ecom Order Flow](/nerve-centre/kpi-cards/databricks/pipeline-lag-vs-ecom-order-flow). A high health score while pipeline lag is climbing against live order flow means the lakehouse is internally healthy but falling behind the business, a gap the single composite alone will not surface.

## Known limitations / FAQs

**What exactly is in the score?**
A weighted blend of job reliability, query error rate, query latency, warehouse and cluster saturation, and DBU efficiency. Reliability and errors carry the most weight because they map most directly to broken data and broken dashboards. The exact weights are tuned per profile and visible in the Sensitivity tab.

**My score is 64 but every dashboard I check looks fine. Why?**
The composite weights pillars you may not be looking at. A common case is job reliability: a batch of overnight job failures drags the heavily-weighted reliability sub-score down even though the interactive query experience you are checking feels normal. Open [Failed Jobs (24h)](/nerve-centre/kpi-cards/databricks/failed-jobs-24h) before assuming the score is wrong.

**Can one bad pillar really push the whole score under 70?**
Yes, deliberately. The reliability and error pillars are weighted and normalised so that a genuinely broken pillar (success rate down to the high 80s, for instance) drags the composite into the degraded band even while everything else is green. The alternative, a score that only goes amber when everything breaks at once, would be useless as an early warning.

**Why blend a real-time read with a 7-day trend?**
The live value answers "is it healthy now?"; the trend answers "is it getting better or worse?". A score of 75 means something different if it is climbing from 60 than if it is falling from 95. Read both: act on the instant, judge your fix on the trend.

**Does Databricks provide this number natively?**
No. There is no single native health score; this is a Vortex IQ composite built from native metrics. That is the point of the card, to give one number where Databricks gives several screens. Reconcile it pillar by pillar rather than expecting a matching figure in the workspace.

**The score recovered the moment I restarted a warehouse. Is it that sensitive?**
Saturation and latency pillars respond quickly because they are near-real-time. Reliability moves more slowly because it is a 24-hour rate. So a fast recovery usually means the soft pillar was saturation or latency, not reliability; if reliability was the cause, expect the score to climb over hours as successful runs accumulate, not instantly.

**Can I reweight the pillars for my team?**
Yes. The component weights and the alert threshold are configurable per profile in the Sensitivity tab. A cost-focused FinOps team may lift the DBU-efficiency weight; a team with strict SLAs may lift latency. The default weighting prioritises reliability and errors.

***

### Tracked live in Vortex IQ Nerve Centre

*Databricks Health Score* is one of hundreds of KPI pulses Vortex IQ tracks across Databricks and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English.

[Start for free](https://app.vortexiq.ai/login) or [book a demo](https://www.vortexiq.ai/contact-us) to see this metric running on your own data.
