At a glance
A real-time alert that fires when the live connection count climbs above 90% ofmax_connectionsand stays there for at least one minute. This is the early-warning band before total exhaustion. At 90% you still have headroom, but you are close enough that one traffic burst, one slow query holding threads, or one runaway batch job can push you to the wall, at which point MariaDB returnsERROR 1040: Too many connectionsand refuses new clients outright. For a DBA this is the “act now, do not wait” signal: you have minutes, not hours, to shed load or add capacity before connections start being rejected.
| What it tracks | An alert list of moments when Threads_connected / max_connections exceeded 90% sustained for one minute. Each entry records the timestamp, the peak ratio, and the duration above threshold. |
| Data source | Threads_connected from SHOW GLOBAL STATUS divided by the max_connections system variable (SHOW VARIABLES LIKE 'max_connections'), sampled in real time. |
| Time window | RT (real-time). The card evaluates every sample and raises an alert entry when the sustained condition is met. |
| Alert trigger | > 90% sustained 1m. Saturation must hold above 90% for a full minute to fire, which suppresses transient single-sample spikes. |
| Severity | High. This is a Hero card because exhaustion directly causes connection refusals, which break the application tier and, downstream, the storefront. |
| Roles | DBA, platform, SRE, on-call |
Calculation
The card computes a live saturation ratio on every sample:Threads_connected is the number of currently open connections (both running queries and idle-but-open sessions), read from SHOW GLOBAL STATUS LIKE 'Threads_connected'. max_connections is the hard ceiling configured for the server. The “sustained 1m” requirement means a single sample above 90% does not raise an alert; the ratio must remain above 90% across consecutive samples spanning at least a minute. This deliberately ignores brief, self-correcting spikes (a deploy, a cache flush, a momentary thundering herd) and only escalates genuine pressure.
Note that MariaDB silently reserves one extra connection above max_connections for a user with the SUPER/CONNECTION ADMIN privilege, so an administrator can always log in to remediate even when ordinary clients are being refused. The card measures against the ordinary ceiling, which is what application traffic competes for.
Worked example
A retail platform runs MariaDB 10.11 withmax_connections = 500 behind an application tier that auto-scales during promotions. Snapshot taken on 22 Apr 26 during a flash-sale window.
| Time (BST) | Threads_connected | Saturation | Sustained? | Alert |
|---|---|---|---|---|
| 19:58 | 410 | 82% | no | clear |
| 20:00 | 458 | 91.6% | started | watching |
| 20:01 | 471 | 94.2% | 1m elapsed | FIRED |
| 20:03 | 489 | 97.8% | yes | escalating |
| 20:05 | 312 | 62.4% | no | cleared |
Sleep state from the reporting application (idle pool connections holding slots) and a cluster of long-running Sending data threads from an unindexed analytics query launched at 19:55. The DBA killed the analytics query (KILL <id>), which freed threads as the pool recycled, and the ratio fell back below 90% by 20:05.
Three takeaways:
- Idle connections count too.
Threads_connectedincludes sessions inSleepstate. A leaky or oversized application pool can hold MariaDB near the ceiling even when almost nothing is actually executing. Pair with Connections In Use to separate active from idle. - Slow queries pull saturation up indirectly. A handful of long-running queries hold their threads, so new requests stack up behind them. Always check Query Latency p95 (ms) and the processlist together; the fix is often killing one query, not adding capacity.
- Raising
max_connectionsis the last resort, not the first. Each connection costs memory (thread stack plus per-connection buffers). Bumping the ceiling without checking RAM can trade connection refusals for an OOM kill. Confirm Memory Usage % headroom before increasing the limit.
Sibling cards
| Card | Why pair it with this alert | What the combination tells you |
|---|---|---|
| Connection Pool Saturation % | The continuous gauge behind this alert. | This alert is the threshold event; the gauge shows the trend leading up to it and how fast it is climbing. |
| Connections In Use | The raw live thread count. | Separates active connections from idle pool slots holding the ceiling near full. |
| Connection Errors (24h) | Counts refusals once the ceiling is hit. | If this alert fired and connection errors then rose, clients were actually turned away. |
| Aborted Connects (24h) | Pre-auth failures during pressure. | High aborts during saturation means the server is too busy to complete handshakes. |
| Query Latency p95 (ms) | Slow queries that hold threads. | Rising p95 alongside saturation points at long queries as the root cause, not raw traffic. |
| Memory Usage % | The constraint on raising the ceiling. | Tells you whether you can safely increase max_connections or must shed load instead. |
| Pool Saturation Across Galera Nodes vs Traffic | The cluster-wide and revenue view. | Shows whether saturation is one hot node or the whole cluster, and what it costs the storefront. |
| MariaDB Health Score | The composite health roll-up. | A sustained saturation alert drives the composite down hard. |
Reconciling against the source
Where to look in MariaDB’s own tooling:Why our number may legitimately differ from a rawSHOW GLOBAL STATUS LIKE 'Threads_connected';for the live connection count.SHOW VARIABLES LIKE 'max_connections';for the configured ceiling (compute the ratio yourself).SHOW GLOBAL STATUS LIKE 'Max_used_connections';for the high-water mark since startup, andMax_used_connections_timefor when it occurred.SELECT * FROM information_schema.PROCESSLIST;(orSHOW FULL PROCESSLIST) for who is holding each connection right now.
SHOW STATUS:
| Reason | Direction | Why |
|---|---|---|
| Real-time vs point-in-time | Marginal | SHOW STATUS is an instant; our card evaluates a sustained one-minute condition, so a single high sample you catch by hand may not have fired an alert. |
max_connections changed at runtime | Ratio shifts | If max_connections was raised with SET GLOBAL, the denominator changed; our card uses the value live at each sample. |
| Reserved SUPER connection | +1 vs ceiling | MariaDB allows one extra connection for an admin above max_connections; we measure against the ordinary ceiling. |
| Per-user connection limits | Lower effective ceiling | max_user_connections or account-level MAX_CONNECTIONS can cap a user below the global ceiling; the global ratio will not reflect that. |
max_connections from a formula based on instance memory (DBInstanceClassMemory), so confirm the effective value before reasoning about the ratio; CloudWatch exposes DatabaseConnections for the live count. SkySQL and Azure Database for MariaDB surface connection metrics in their own consoles. Once you align on the same max_connections value, the ratio should match.
Known limitations / FAQs
Q: The alert fired but I could still connect fine. Was it a false alarm? No. 90% is the warning band, not the failure point. You hit refusals at 100%. The alert exists precisely so you act before clients are turned away. You could connect because you used aSUPER-privileged account, which gets the one reserved connection above the ceiling, or because saturation dipped between the alert and your test. Treat a fired alert as a near-miss to investigate, not a non-event.
Q: Most of my connections are in Sleep state. Should I still worry?
Yes, idle connections still occupy slots and count toward Threads_connected. An oversized or leaky application pool can pin the server near the ceiling with almost no real work happening. Tune the pool’s maxPoolSize and idle-timeout so it returns connections, and consider a connection proxy (MaxScale, ProxySQL) to multiplex. Pair with Connections In Use to see the active-versus-idle split.
Q: Should I just raise max_connections to make the alert stop?
Only after checking memory. Every connection consumes a thread stack plus per-connection buffers (sort_buffer_size, join_buffer_size, read_buffer_size, and so on), so raising the ceiling raises peak memory. If you bump it past what RAM allows, you trade Too many connections for an OOM kill, which is far worse. Check Memory Usage % first, and prefer fixing the source of the connections (pool sizing, slow queries) over raising the limit.
Q: Why the one-minute sustained requirement? I want to know about every spike.
Brief spikes above 90% are common and self-correcting: a deploy reconnecting pools, a cache stampede, a momentary burst. Alerting on every single sample would bury you in noise. The one-minute hold ensures the alert reflects genuine, persistent pressure. If you genuinely want a tighter trigger, the Connection Pool Saturation % gauge shows every sample without the sustain filter.
Q: On a Galera cluster, does this measure one node or the whole cluster?
This card measures the selected MariaDB instance. In a Galera cluster each node has its own max_connections and its own Threads_connected, and traffic is rarely balanced perfectly, so one node can saturate while others have headroom. For the cluster-wide picture use Pool Saturation Across Galera Nodes vs Traffic, which compares saturation across all nodes against incoming traffic.
Q: What happens at exactly 100%?
New non-SUPER connections are refused with ERROR 1040: Too many connections, and Connection_errors_max_connections increments. Existing connections keep working. The application sees connection failures and, if it does not degrade gracefully, user-facing errors. This is why the 90% alert matters: it gives you the lead time to avoid 100% entirely.