Skip to main content
Card class: Non-HeroCategory: Jobs & Workflows

At a glance

Top 10 Failing Workflows (7d) ranks the ten Databricks Jobs (workflows) with the most failed runs over the trailing 7 days, broken down by row. It is the triage shortlist for a platform or data-engineering team: rather than wading through every job, you see at a glance which scheduled pipelines are costing you the most failures, so you can fix the chronic offenders first.
What it tracksThe ten Jobs with the highest count of runs ending in result_state = FAILED or TIMEDOUT over the last 7 days, one row per Job, ordered by failure count descending. Sourced from the Jobs API /api/2.1/jobs/runs/list, filtered to terminal failed states.
Time window7d (trailing 7 days, refreshed on the standard data refresh).
Alert triggerNone. This is a ranking table for triage, not a threshold alert. For the live failure signal use the Failed Job Burst and Failed Jobs (24h) cards.
Rolesengineering, operations

What it tracks

Each row is a single Databricks Job (a named workflow that may chain several tasks) together with how many of its runs failed in the trailing 7 days. The engine reads run history from /api/2.1/jobs/runs/list, counts the runs whose result_state is FAILED or TIMEDOUT, groups by job_id, and returns the top ten by count. A workflow that runs hourly and fails twice a day will out-rank a daily workflow that fails once, which is intentional: the card surfaces volume of failure, the thing that erodes trust in your data freshness. Read it weekly alongside Job Success Rate (24h) to separate one bad day from a structural problem, and use Top 10 Slowest SQL Queries when failures are actually timeouts in disguise.

Reconciling against the source

To verify in Databricks natively, open Workflows → Jobs, sort the job list by recent run status, or query the system.lakeflow.job_run_timeline system table (Unity Catalog) filtered to result_state IN ('FAILED','TIMEDOUT') over the last 7 days and grouped by job_id. Counts can differ slightly because Vortex IQ uses a 7-day rolling window in your reporting time zone while the Workflows UI defaults to a fixed run-count page in workspace time.

Tracked live in Vortex IQ Nerve Centre

Top 10 Failing Workflows (7d) is one of hundreds of KPI pulses Vortex IQ tracks across Databricks and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.