At a glance
An alert pulse that fires when more than 1% of database transactions roll back or error, sustained over a 5-minute window. This is the database-layer counterpart to the PostgREST 5xx alert: where that card measures failures at the API surface, this one measures failures inside Postgres itself. A query error spike means transactions are aborting (constraint violations, deadlocks, statement timeouts, permission failures, or a database that has gone read-only). When this fires alongside a PostgREST 5xx spike, the database is the cause and the API is just relaying it.
| Data source | Postgres statistics views (pg_stat_database commit/rollback counters) and the project metrics endpoint. The card tracks the rate of rolled-back / errored transactions against total transactions for the project database. |
| Metric basis | Query error rate = errored or rolled-back transactions divided by total transactions (commit + rollback), as a percentage, over the window. Database-side failures, not API status codes. |
| Aggregation window | 5m rolling. Evaluated over the trailing 5 minutes; the alert requires the breach to be sustained across the window. |
| Alert threshold | > 1% sustained for 5m. Occasional rollbacks are normal (every retried optimistic-lock conflict is a rollback); a sustained elevation above 1% is the fault signal. |
| Why it matters | A spike means transactions are failing at the database. Causes include a bad migration, a missing or renamed object, a permissions change, deadlock storms, statement timeouts, or the database entering read-only mode after hitting the disk cap. These are the failures that data integrity and write availability depend on. |
| What counts | Transactions that roll back or error, as reflected in the database statistics counters for the project database. |
| What does NOT count | Application-side retries that ultimately succeed are still counted as a rollback for the failed attempt (this is correct: the attempt did fail). Read-only query plans that succeed do not count. |
| Time window | 5m (rolling 5-minute window) |
| Alert trigger | > 1% sustained 5m |
| Roles | owner, platform, sre |
Calculation
The card divides errored or rolled-back transactions by total transactions for the project database over the trailing 5-minute window:pg_stat_database view, which Postgres maintains as monotonically increasing totals of committed and rolled-back transactions. Vortex IQ samples those counters and works with the delta across the window rather than the lifetime totals, so the rate reflects what is happening now, not the database’s entire history since the last statistics reset.
The alert is sustained, not instantaneous. A baseline rollback rate is normal and healthy: every optimistic-concurrency retry, every deadlock the application recovers from, and every constraint a write deliberately tests produces a rollback. Paging on those would be useless. The pulse raises only when the error rate stays above 1% across the full 5-minute window, which separates a genuine fault (a broken migration, a permissions change, a deadlock storm, a read-only database) from the routine background of recoverable rollbacks.
Worked example
A platform team ships a schema migration to a Supabase-backed application during a low-traffic window. Snapshot taken on 03 Jun 26 at 02:40 BST, minutes after the migration ran.| Window (BST) | Total transactions | Rolled back | Error rate |
|---|---|---|---|
| 02:30 to 02:35 | 88,400 | 71 | 0.08% |
| 02:35 to 02:40 | 84,900 | 3,140 | 3.70% |
- The timing points straight at the migration. The spike began within minutes of the deploy. The most common cause of a sudden, sustained database error rate immediately after a migration is a structural change the application still violates: a renamed or dropped column the app still writes to, a new NOT NULL or CHECK constraint that existing writes fail, or a row-level-security policy change that rejects updates.
- This is failing writes, which is worse than failing reads. Rollbacks mean transactions did not commit. If the failing transactions are on the write path (orders, cart updates, inventory decrements), data is silently not being saved. Reads degrade the experience; failed writes lose business state. Treat a write-path error spike as higher severity than a read-path one.
- The fastest safe action is usually to roll the migration back. Rather than debug forward under failing writes, reverting the schema change restores the contract the running application expects. Confirm the spike clears after rollback, then reproduce and fix the migration in a non-production environment before re-shipping. If rollback is not possible, identify the specific failing statement from the Postgres logs and patch the offending constraint or grant.
Sibling cards merchants should reference together
| Card | Why pair it with Database Query Error Rate Spike | What the combination tells you |
|---|---|---|
| Database Query Error Rate % | The continuous gauge this alert is built on. | The alert says the line was crossed; the gauge shows the shape of the spike over time. |
| PostgREST 5xx Error Spike (>1% in 5m) | The API layer above the database. | Both firing equals a database fault relayed as 5xx; this alone equals errors caught by app retries. |
| Deadlocks (last 5m) | Deadlocks are a specific cause of rolled-back transactions. | A deadlock storm shows up here as an error spike; the deadlock card isolates that cause. |
| Database Disk Usage % | A full disk forces the database into read-only mode. | Disk near 100% plus an error spike equals writes failing because the database is restricted. |
| Slow-Query Rate % | Statement timeouts turn slow queries into errors. | A slow-query rise then an error spike means queries are timing out into rollbacks. |
| Supavisor Pool at >90% Saturation | Connection failures can present as transaction errors. | Pool saturated plus error spike points at connection-level failure, not query logic. |
| Supabase Health Score | The composite this alert feeds. | An open query error spike pulls the composite down and frames it against other live signals. |
Reconciling against the source
Where to look in Supabase’s own tooling:
Logs → Postgres in the managed-service console for the per-statement error stream; the error bodies name the exact failing constraint, object, or permission.
Project metrics endpoint (/customer/v1/privileged/metrics, Prometheus format) for the commit and rollback counters Vortex IQ reads.
Reports → Database for the transaction and error graphs over time.
Database → Migrations to confirm which migration ran and when, the prime suspect when a spike follows a deploy.
Confirm the picture with native SQL:
| Reason | Direction | Why |
|---|---|---|
| Lifetime vs windowed | SQL higher or lower | pg_stat_database shows totals since the last statistics reset; the card uses the 5-minute delta, so the live rate differs from the lifetime percentage. |
| Statistics reset | SQL drops | If the database statistics were reset, the SQL totals restart while the card’s windowed delta is unaffected. |
| Window alignment | Variable | The card uses a rolling 5-minute window; a console graph on calendar buckets can split a spike across two bars. |
| Sampling cadence | Brief lag | The metrics endpoint is scraped on an interval; a value at the exact moment of a spike may lag the live console graph by one scrape. |
| Card | Expected relationship | What causes divergence |
|---|---|---|
| PostgREST 5xx Error Spike (>1% in 5m) | Usually co-occurs when the database fault reaches the API. | This alone, with clean PostgREST, means app retries are absorbing the failures. |
| Deadlocks (last 5m) | A deadlock storm raises the error rate. | An error spike with zero deadlocks rules deadlocking out as the cause. |