Skip to main content
Card class: HeroCategory: Cross-Channel: Revenue at Risk

At a glance

CRDB Pool Saturation vs Traffic Burst lays your CockroachDB connection-pool saturation alongside front-of-store traffic, row by row, over a rolling 15-minute window. It answers the one question a DBA cannot answer from the database alone: “is the pool filling up because real shoppers arrived, or because something is leaking sessions?” When saturation climbs in lockstep with a genuine traffic burst, the cluster is doing its job and you scale. When saturation climbs while traffic is flat, you have a connection leak or a runaway client pool, and adding capacity just delays the wall. This is the card that stops a team raising the connection ceiling against a leak.
What it tracksCRDB Pool Saturation vs Traffic Burst, broken down by row: each row pairs a time bucket’s pool-saturation percentage with the matching front-end traffic reading (sessions or requests per minute) from the linked ecommerce or analytics connector.
Data sourceCockroachDB side: the same saturation series behind Connection Pool Saturation %, the ratio of sql.conns (summed across live nodes) to the configured connection ceiling (server.max_connections_per_gateway across gateways, or the CockroachDB Cloud plan limit). Traffic side: live sessions or request rate from the joined storefront connector (Shopify / BigCommerce / Adobe Commerce, or Google Analytics).
Metric basisCorrelation, not a single number. The card is a two-series table so you can read saturation and traffic in the same buckets. Saturation is a percentage of the ceiling; traffic is sessions or requests per minute.
Time window15m, a rolling 15-minute window bucketed so each row covers a short interval (typically 1 minute). Short enough to catch a flash-sale ramp, long enough to show the shape of the climb.
Alert trigger>90% during traffic burst: saturation above 90% at the same time as an elevated traffic reading. A breach with traffic flat is surfaced separately as a likely leak rather than a capacity event.
What this distinguishes(1) Real demand: saturation and traffic rise together; (2) Connection leak: saturation rises, traffic flat; (3) Over-provisioned headroom: traffic bursts, saturation barely moves (pooler working well).
What does NOT fireSaturation spikes shorter than the bucket, traffic bursts that the pool absorbs comfortably under 90%, and BI / batch session churn that the engine can attribute to a non-storefront pool.
RolesDBA, platform, SRE

Calculation

The card joins two independent series on a shared time axis and renders them per bucket:
row[t] = {
  bucket:       t  (rolling 15m window, ~1-minute buckets)
  saturation%:  (open SQL connections at t / configured connection ceiling) * 100
  traffic:      front-end sessions or requests per minute at t (joined connector)
  correlated?:  saturation breach AND traffic elevated in the same bucket
}
The CockroachDB numerator is the cluster-wide sum of the sql.conns gauge; the denominator is the connection ceiling (server.max_connections_per_gateway applied across gateway nodes on self-hosted clusters, or the plan limit on CockroachDB Cloud). The traffic series comes from whichever storefront or analytics connector is linked in the same Nerve Centre profile, normalised to a per-minute rate so the two series share a cadence. The alert opens only when a saturation breach above 90% coincides with an elevated traffic reading in the same bucket: that pairing is the expected, scalable case and tells the team capacity is the lever. The more interesting signal is the anti-correlation: saturation above 90% with traffic flat. The engine surfaces that row distinctly because it points at a leak or a misbehaving client pool, where raising the ceiling is the wrong move. Each correlated row carries peak saturation, the busiest gateway nodes, and the matching traffic figure so the on-call engineer can size the response in one read.

Worked example

A platform team runs a 5-node CockroachDB self-hosted cluster behind the order and inventory APIs for a high-traffic retailer on Shopify. server.max_connections_per_gateway is 500 across all 5 gateways, giving a 2,500-connection ceiling. The storefront connector feeds live session rate. Snapshot taken on 14 Apr 26 at 20:00 BST, during the opening minutes of a scheduled flash sale.
Bucket (BST)SaturationOpen connsStorefront sessions/minCorrelated?
19:5656%1,4001,180no (healthy)
19:5971%1,7752,640no (climbing)
20:0188%2,2004,910no (just under)
20:0392%2,3006,050yes (real burst)
20:0494%2,3606,400yes (real burst)
Saturation and sessions rise together: every percentage point of pool fill is matched by more shoppers arriving. This is the scalable case. The card flags the 20:03 to 20:04 rows as a correlated breach, and the team reads it as “we are genuinely out of headroom because the sale worked”, not “something is broken”. What the on-call SRE does with this:
  1. Confirm the correlation is tight. Because saturation tracks traffic almost linearly, this is real demand. Contrast with the alternative below, where the same 94% would mean a leak.
  2. Scale the right lever. Real demand means widen capacity: raise server.max_connections_per_gateway if node memory allows (Memory Usage % is comfortable here), add a gateway node, or, structurally, front the cluster with a connection pooler so app threads multiplex onto a bounded server pool.
  3. Watch the downstream pulse. Cross-read Statement Latency p95 (ms): if it has not moved, the pool is full but not yet hurting users, and you have a few minutes to act cleanly.
Now the contrast that makes this card valuable. Same cluster, a different evening:
Bucket (BST)SaturationOpen connsStorefront sessions/minCorrelated?
02:1061%1,525240no
02:2578%1,950235no (traffic flat)
02:4091%2,275238leak signal
02:5596%2,400241leak signal
Saturation marches to 96% while traffic sits at roughly 240 sessions/min all night. This is not demand; it is a service opening connections and never returning them to its pool. Raising the ceiling here would simply move the wall a few hours later. The fix is to find and restart the leaking client (or fix its pool configuration), confirmed by pairing with Connections In Use showing a flat-traffic climb.
Why the comparison matters financially:
  - Real burst at 94%: scaling now keeps a flash sale converting; doing nothing risks refused checkouts.
  - Leak at 96% with flat traffic: scaling wastes money AND hides the bug; the wall returns.
  - Same saturation number, opposite correct action. Only the traffic pairing tells them apart.
Three takeaways for the team:
  1. Saturation alone is ambiguous; saturation paired with traffic is a decision. 94% means “scale” or “find the leak” depending entirely on whether traffic moved with it.
  2. A correlated breach is a capacity win, not a failure. It means the sale drove real load. Plan capacity ahead of known peaks so you are not scaling reactively at 94%.
  3. The anti-correlated row is the one to fear. Saturation up, traffic flat, is the leak signature, and it is the case a database-only view would misread as “we need more capacity”.

Sibling cards

CardWhy pair it with CRDB Pool Saturation vs Traffic BurstWhat the combination tells you
Connection Pool Saturation %The database-only saturation gauge this card joins to traffic.The gauge gives the live percentage; this card tells you whether traffic explains it.
Connection Pool at >90% SaturationThe alert that fires on the saturation series alone.The alert says “we crossed 90%”; this card says whether real demand or a leak drove it.
Connections In UseThe raw numerator behind saturation.A climb with flat storefront traffic confirms a leak rather than load.
Statement Latency p95 (ms)Where a full pool first hurts users.p95 rising during a correlated breach means workers are already queuing for sessions.
Memory Usage %Each connection consumes memory.High saturation plus high memory means raising the ceiling is unsafe; add nodes instead.
Statements per Second (live)The workload the connections are carrying.QPS rising with saturation equals active load; QPS flat while saturation climbs equals idle, leaked sessions.
CRDB Statements Spike vs Ecom Order RateThe sibling cross-channel card on the query-volume axis.Saturation and statements both tracking order rate confirms healthy end-to-end scaling.
CockroachDB Health ScoreThe executive composite a sustained breach feeds.A correlated breach during peak is expected; a leak-driven breach drags the score down without a business reason.

Reconciling against the source

Where to look natively:
DB Console SQL dashboard (“Open SQL Sessions” panel) for the live sql.conns series per node, the database side of this card. SELECT count(*) FROM crdb_internal.cluster_sessions; for the exact open-connection count at a moment. SHOW CLUSTER SETTING server.max_connections_per_gateway; to confirm the ceiling the saturation percentage divides by. CockroachDB Cloud Metrics tab plots the same connection series; the cluster Overview shows the plan connection limit. The traffic side has no CockroachDB equivalent: confirm it against your storefront or analytics connector’s own session / request reports.
Why our number may legitimately differ from the native view:
ReasonDirectionWhy
Two-connector joinN/AThe DB Console shows only the saturation series. The traffic pairing exists only in Vortex IQ because it joins a second connector; there is no native panel to reconcile the correlation against.
Bucket alignmentBrief lagSaturation and traffic are polled independently and snapped to shared buckets. Near a sharp ramp the two series can land a bucket apart before they realign.
Ceiling sourceEither wayIf max_connections_per_gateway was changed but not reloaded, the native panel may divide by a stale denominator while Vortex IQ uses the configured value.
Per-node vs clusterVortex IQ may read lowerThis card uses cluster-wide saturation; the DB Console can show one hot gateway at a higher local percentage.
Cross-connector reconciliation:
CardExpected relationshipWhat causes divergence
CRDB Statements Spike vs Ecom Order RateA correlated saturation breach should accompany a statements and order-rate rise.Saturation up with statements flat means idle sessions accumulating, not active queries: a leak.
Slow Statements During Checkout Window (5m)A correlated breach plus slow statements during checkout points at the same pressure.Slow checkout statements with no saturation breach means the bottleneck is query plans or contention, not connections.

Known limitations / FAQs

Saturation hit 94% but the card did not raise the correlated alert. Why? The alert fires on saturation above 90% paired with an elevated traffic reading in the same bucket. If traffic was flat when saturation climbed, the engine surfaces the row as a likely leak rather than a capacity alert, because the correct action is different. Read the traffic column: if it is flat, treat it as a connection leak and pair with Connections In Use. Traffic clearly spiked but saturation barely moved. Is the card broken? No, that is the healthiest possible reading. It means your connection pooler (or a generous ceiling) absorbed the burst without the server-side pool filling. A storefront burst that does not move saturation is the goal; it tells you that you have real headroom for the next, larger peak. Which traffic series does the card use? Whichever storefront or analytics connector is linked in the same Nerve Centre profile: storefront sessions or request rate from Shopify, BigCommerce, or Adobe Commerce, or session rate from Google Analytics. If no front-end connector is linked, the card shows the saturation series alone and cannot classify breaches as demand vs leak; link a storefront connector to unlock the comparison. The two series look offset by a minute near a sharp ramp. Should I worry? No. Saturation and traffic are polled independently and snapped to shared buckets, so during a steep climb one can lead the other by a single bucket before they realign. The shape of the correlation over the 15-minute window is what matters, not a single-bucket offset. On CockroachDB Cloud I cannot set max_connections_per_gateway. Does the comparison still work? Yes. On Cloud the connection limit is set by your plan and enforced by the managed proxy; Vortex IQ divides by that plan limit instead of the cluster setting. The traffic pairing is unchanged, so the demand-vs-leak distinction works identically. A correlated breach fired during a flash sale. Do I need to do anything? A correlated breach means real demand, so the lever is capacity, not bug-hunting. If Statement Latency p95 (ms) has not climbed, the pool is full but not yet hurting users and you have a short window to widen the ceiling (memory permitting) or add a gateway. The durable answer for recurring peak breaches is a connection pooler in front of the cluster. Can a single hot gateway create a false leak signal? The card uses cluster-wide saturation, so a single hot gateway that averages out below 90% will not register at all. If you suspect uneven distribution, check the per-node spread on Connection Pool Saturation % and confirm your load balancer is spreading connections evenly across gateways.

Tracked live in Vortex IQ Nerve Centre

CRDB Pool Saturation vs Traffic Burst is one of hundreds of KPI pulses Vortex IQ tracks across CockroachDB and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.