At a glance
Per-node connection pool saturation across every Galera node, laid alongside live storefront traffic so you can see whether a node is filling up because of genuine shopper demand or because of an imbalance, a stuck thread, or a hot node. Each row is one Galera node: its used vs available connections, its saturation percentage, and the share of cluster traffic landing on it. The danger reading is one node above 90% while siblings sit comfortably; that means your load balancer or proxy is hot-spotting, and a single node rejecting connections can stall checkout even though the cluster has spare capacity.
| What it tracks | Connection pool saturation (Threads_connected / max_connections) per Galera node, joined row-by-row against the live traffic share for that node, over a rolling 15-minute window. |
| Data source | Per-node SHOW GLOBAL STATUS LIKE 'Threads_connected' and Max_used_connections against @@max_connections, correlated with storefront sessions/orders from the connected ecommerce platform (Shopify, BigCommerce or Adobe Commerce). |
| Time window | 15m rolling, refreshed on the live poll. |
| Alert trigger | >90% sustained during burst, any node holding above 90% saturation while traffic is spiking. |
| Roles | owner, engineering, operations |
Calculation
For each Galera noden, saturation is Threads_connected(n) / max_connections(n) expressed as a percentage. The traffic column is that node’s share of cluster query volume in the window, or where a storefront link exists, the share of active shopper sessions routed to it. The card overlays the two so the question “is this node busy because customers are busy?” can be answered at a glance. The alert fires only when saturation stays above 90% for the duration of a traffic burst rather than on a momentary spike, because brief peaks during flash-sale onset are expected and self-clear. The grounding signal is the >90% sustained during burst rule from the card’s alert definition over the 15-minute window: sustained, not instantaneous, and burst-qualified so a quiet-hour blip does not page anyone.
Worked example
A 3-node Galera cluster backs the Adobe Commerce store of a UK homeware brand. A 20%-off email goes out at 19:00 on 14 Apr 26 and traffic triples within four minutes. Snapshot taken at 19:06 BST, 15-minute window.| Node | Threads_connected | max_connections | Saturation | Traffic share | Reading |
|---|---|---|---|---|---|
| galera-1 | 142 | 500 | 28% | 33% | Healthy, proportional |
| galera-2 | 138 | 500 | 28% | 32% | Healthy, proportional |
| galera-3 | 481 | 500 | 96% | 35% | Saturated, disproportionate |
- Capacity is not the issue. The cluster is holding 761 connections against a combined ceiling of 1,500. There is plenty of headroom in aggregate; the problem is distribution.
- Connections are not closing on galera-3. Equal traffic but 3.4x the open connections means threads are accumulating: likely a long-running query holding connections open, an application pool that opened persistent connections to galera-3 and never recycled them, or a proxy that pinned sticky sessions to one node.
- The clock is ticking on checkout. Once galera-3 hits
max_connectionsit returns “Too many connections” and every shopper routed there by the proxy fails at the worst moment. Pair with Connection Pool at >90% Saturation to see if the alert has already fired.
SHOW PROCESSLIST then KILL), and confirm the application’s connection pool is recycling. Capacity planning comes later; right now it is a hot-node rebalance.
Three takeaways:
- Saturation alone lies; saturation versus traffic tells the truth. A node at 90% during a genuine 3x traffic burst that is shared evenly is fine. A node at 90% while peers sit at 28% is a routing or leak problem, and the difference is invisible without the traffic overlay.
- Aggregate headroom hides per-node danger. Total cluster utilisation can read 50% while one node is about to reject connections. Always read the per-node rows, not a cluster average.
- In Galera every node serves writes, so a hot node is a checkout risk. Unlike async primary/replica, there is no read-only node to sacrifice; a saturated Galera node failing connections directly blocks order writes routed to it.
Sibling cards
| Card | Why pair it with Pool Saturation vs Traffic | What the combination tells you |
|---|---|---|
| Connection Pool Saturation % | The single-instance saturation gauge this card spreads across nodes. | If the aggregate gauge looks fine but this card shows one hot node, you have a distribution problem, not a capacity one. |
| Connection Pool at >90% Saturation | The real-time alert that fires when a node crosses the line. | This card explains which node and why (traffic-justified or not) once that alert pages you. |
| Galera Cluster Size | Confirms how many nodes should be sharing the load. | A node missing from the cluster concentrates traffic onto survivors, driving the remaining nodes toward saturation. |
| Galera Cluster Status | Confirms the cluster is in Primary state and accepting writes. | A non-Primary node refuses writes, so the proxy piles load on the rest. |
| Galera Flow Control Paused % | Detects a slow node throttling the whole cluster. | A node under flow control may hold connections longer, inflating its saturation. |
| Aborted Connects (24h) | Counts connections that failed to establish. | Rising aborts on the saturated node confirm it is rejecting at the ceiling. |
| Connections In Use | The raw open-connection count. | Read together to see whether the percentage is rising from more connections or a lowered max_connections. |
| MariaDB QPS Spike vs Ecom Order Rate | The query-volume twin of this traffic overlay. | If QPS spikes without an order spike, the connection pressure may be bots, not shoppers. |
Reconciling against the source
Where to look in MariaDB’s own tooling:Per node, runWhy our number may legitimately differ from a manual reading:SHOW GLOBAL STATUS LIKE 'Threads_connected';andSHOW GLOBAL STATUS LIKE 'Max_used_connections';then divide bySELECT @@max_connections;.SHOW STATUS LIKE 'wsrep_%';shows the Galera node’s cluster view;wsrep_cluster_sizeandwsrep_local_state_commentconfirm which nodes are live.SHOW PROCESSLIST;(orSELECT * FROM information_schema.PROCESSLIST) reveals what is holding connections open on the hot node. On managed platforms: AWS RDS / Aurora “DatabaseConnections” per-instance CloudWatch metric, or SkySQL / MariaDB Enterprise monitoring per-node panels.
| Reason | Direction | Why |
|---|---|---|
| Poll timing | Snapshot vs sustained | Vortex IQ reports the sustained 15-minute view; a single SHOW STATUS is one instant and may catch a trough or a peak. |
| Proxy-counted connections | Our number can be higher | If MaxScale / ProxySQL multiplexes, the proxy’s view of pooled connections can differ from the node’s raw Threads_connected. |
| max_connections per node | Variable | Nodes may have different max_connections; the percentage normalises for that, the raw count does not. |
| Traffic attribution | Approximate | Traffic share is inferred from query volume or storefront sessions; sticky-session proxies can skew which node “owns” a shopper. |
| Card | Expected relationship | What causes divergence |
|---|---|---|
shopify.total_revenue / bigcommerce.total_revenue / adobe_commerce.total_revenue | A saturated node rejecting connections should show as a dip in order rate during the same window. | If revenue holds while a node is saturated, the proxy successfully rerouted away from the hot node. |
google_analytics.ga_property_health | Independent demand-side traffic peer. | GA4 sessions spiking confirms a genuine traffic burst, validating that saturation is demand-driven rather than a leak. |
Known limitations / FAQs
One node is at 95% but the storefront seems fine. Do I still need to act? Yes, treat it as urgent even if shoppers are unaffected yet. “Fine” usually means the proxy is still finding capacity on the healthy nodes. The moment the saturated node hitsmax_connections it returns “Too many connections” and any shopper routed there fails. Drain or rebalance the hot node before it reaches the ceiling, do not wait for the revenue dip.
Why does one node fill up faster than the others if Galera is multi-primary?
Galera replicates writes everywhere, but client connections are distributed by your proxy or load balancer, not by Galera itself. Hot-spotting comes from sticky sessions pinned to one node, a deployment that hard-codes one node’s address, persistent application pools that never rebalance, or a long-running query holding threads open on that node. The fix is at the proxy and application layer, not in Galera config.
The card shows all three nodes near 90% during a sale. Is that the same alert?
No. Even saturation that tracks an evenly shared traffic burst is a capacity signal, not a distribution fault. The dangerous pattern this card highlights is disproportionate saturation (one node hot, peers cool). If all nodes saturate together under genuine demand, you need more capacity (higher max_connections, more RAM per node, or another node), which is a planning decision, not an incident.
How is “traffic” measured for the overlay?
Where the MariaDB connector is linked to a storefront connector, traffic share uses active shopper sessions or order rate routed to each node. Where no storefront link exists, it falls back to per-node query volume. Sticky-session proxies can make session attribution approximate, so treat the traffic column as directional, not exact.
Can I change the 90% threshold?
Yes. The 90% sustained-during-burst trigger is the default; adjust it per profile in the Sensitivity tab. Instances with very high max_connections headroom may prefer 85%; tightly provisioned single-node setups may want a lower bar to get earlier warning.
Does this card work on a single-node MariaDB instance?
It renders, but with one row it is effectively Connection Pool Saturation % with a traffic overlay. The per-node comparison value only appears with two or more Galera nodes. For async primary/replica topologies use the per-instance saturation card instead.
A node dropped out of the cluster and the survivors spiked. Is that this card or a Galera card?
Both, read together. Galera Cluster Size tells you a node left; this card shows the consequence (the remaining nodes absorbing its share and climbing toward saturation). Restore the node or add capacity before the survivors hit their ceiling.