At a glance
Slow-Query Rate % is the share of SQL statements over the last 15 minutes whose service latency crossed the slow threshold (200ms by default, the same line the p95 card uses). It answers a question a raw latency percentile cannot: “what fraction of my workload is actually slow right now?” A p99 can look ugly while only a sliver of traffic is affected, and a healthy-looking average can hide a steady drip of slow statements. This card collapses that into one number a DBA or on-call SRE can read at a glance. At or under 1% is a comfortable cluster; 1 to 5% is worth watching; above 5% fires an alert because a meaningful slice of your workload is now slow enough for users to feel.
| What it tracks | Slow-Query Rate % for the selected period: slow statements divided by total statements, expressed as a percentage. |
| Data source | Computed by Vortex IQ from CockroachDB statement statistics: the per-statement latency distribution in crdb_internal.node_statement_statistics (and the cluster-wide crdb_internal.cluster_statement_statistics), counting statements whose service latency exceeds the slow threshold against the total statement count. The same sql.service.latency time-series that backs the latency-percentile cards underpins this rate. On CockroachDB Cloud the equivalent statement-stats and metrics are read via the Cloud metrics API and the SQL Activity page. |
| Time window | 15m (a rolling 15-minute window, refreshed in near real time). |
| Alert trigger | > 5%. When more than one statement in twenty is slow over the 15-minute window, the card turns red. |
| Roles | DBA, platform, SRE, engineering |
Calculation
The rate is a simple ratio computed over the rolling 15-minute window:- What “slow” means. The slow threshold defaults to 200ms of service latency, the time CockroachDB spends planning and executing the statement (it excludes network round-trip to the client). This is the same threshold the Statement Latency p95 (ms) card alerts on, so the two read consistently. The threshold is configurable per profile.
- Service latency, not full round-trip. Because the measure is server-side service latency, a slow client or a saturated network link will not inflate this number. That keeps it a clean signal of database-side slowness.
- Count-weighted, not time-weighted. Every statement counts once regardless of how slow it was. A statement at 205ms and a statement at 9 seconds each add one to the numerator. This is deliberate: the card answers “how much of my workload is slow”, and the Top Contended Statements and latency-percentile cards answer “how slow, and why”.
- Internal statements excluded. Background and internal SQL (schema jobs, statistics collection, internal range housekeeping) is filtered out so the rate reflects application traffic, not the cluster talking to itself.
Worked example
A platform team runs a 5-node CockroachDB cluster (v23.2) backing the cart, catalogue, and order services for an ecommerce stack. Snapshot taken on 14 Apr 26 at 12:05 BST during the lunchtime peak.| Window | Total statements (15m) | Slow statements (> 200ms) | Slow-Query Rate % | Reading |
|---|---|---|---|---|
| 11:30 to 11:45 | 1,420,000 | 8,520 | 0.6% | Healthy baseline. |
| 11:45 to 12:00 | 1,510,000 | 22,650 | 1.5% | Drifting, worth a glance. |
| 11:50 to 12:05 | 1,380,000 | 91,080 | 6.6% | Above the 5% trigger, card red. |
UPDATE inventory SET qty = qty - $1 WHERE sku = $2 pattern accounting for the bulk of the slow events: a flash sale on one hot SKU is serialising writes to the same range, and the contention is dragging the slow rate up. Statement Latency p99 (ms) confirms the tail has blown out to 1,400ms.
- The rate tells you how much, not why. A high Slow-Query Rate is the prompt; the Top Contended Statements and latency-percentile cards tell you which statements and how slow. Never act on the headline alone.
- Always check the denominator. On a quiet cluster a tiny number of slow analytical queries can push the percentage high without any user impact. Pair it with statements-per-second so you know whether 6% means “60 slow queries” or “60,000”.
Sibling cards
| Card | Why pair it with Slow-Query Rate | What the combination tells you |
|---|---|---|
| Statement Latency p95 (ms) | Shares the 200ms slow threshold. | A high slow rate with a p95 above 200ms confirms the slowness is broad, not a tail artefact. |
| Statement Latency p99 (ms) | The tail view of the same latency distribution. | High slow rate plus extreme p99 means a subset of statements is very slow, often contention. |
| Statement Latency p50 (ms) | The median, your typical statement. | If p50 is also rising, the whole workload is slowing, not just the tail. |
| Top Contended Statements | The “why” behind a slow-rate spike. | Contention on one range frequently drives the slow rate up without any capacity shortfall. |
| Statements per Second (live) | The denominator sanity check. | A high rate on low throughput is usually a thin-traffic artefact, not an incident. |
| Statement Error Rate % | The error companion to slowness. | Slow plus erroring usually means contention or retries; slow but clean means pure latency. |
| Connection Pool Saturation % | The capacity angle. | A saturated pool can make statements queue and read as slow even when the cluster is healthy. |
| CockroachDB Health Score | The composite that takes latency as an input. | A slow-rate spike is one of the signals that can pull the overall health score down. |
Reconciling against the source
CockroachDB does not print a single “slow-query rate” figure, so reconcile it from the statement statistics that feed it:- DB Console SQL Activity. The Statements page lets you sort by latency and filter by a latency floor; counting statements above 200ms against the total over a 15-minute window reproduces the rate. The Statements page also exposes the per-statement execution counts and latency percentiles used in the calculation.
crdb_internaltables.SELECT * FROM crdb_internal.node_statement_statistics(and the cluster-widecrdb_internal.cluster_statement_statistics) expose per-statement counts and latency, so you can compute the ratio directly in SQL.- Slow-query logging. If you have enabled the SQL slow-query log (the
sql.log.slow_query.latency_thresholdcluster setting), the log captures every statement over the configured threshold; the count of log lines over a window divided by total statements is another way to sanity-check the rate. - Time-series. The
sql.service.latencyhistogram in the DB Console Metrics dashboard underpins the percentile cards and the slow threshold.