At a glance
Connection Pool Saturation % is the share of the instance’s connection ceiling that is currently in use:Threads_connected / max_connections. It answers a single operational question for the on-call DBA: how close is this server to refusing new connections? At 100% MySQL returnsERROR 1040 (HY000): Too many connectionsand every new client (application worker, cron job, read replica health check) is rejected until a slot frees up. That failure mode is abrupt, not gradual, so this card is treated as a Hero capacity signal and watched in real time.
| What it tracks | Live connection-slot utilisation expressed as a percentage. Threads_connected is the count of currently open connections (active plus sleeping); max_connections is the hard ceiling configured for the instance. |
| Data source | Threads_connected from SHOW GLOBAL STATUS, divided by the max_connections system variable from SHOW GLOBAL VARIABLES. Both are read on every poll so a runtime change to max_connections is picked up automatically. |
| Time window | RT/1m (real-time gauge, sampled continuously and evaluated against the alert threshold on a sustained 1-minute basis). |
| Alert trigger | > 90%. A sustained reading above 90% means roughly one connection storm away from Too many connections rejections. |
| Aggregation | Point-in-time gauge. The headline shows the latest sample; the sparkline shows the trailing window so a slow climb is distinguishable from a momentary spike. |
| Units | Percentage (0 to 100). The card also exposes the raw Threads_connected and max_connections numbers on hover so you can see the absolute headroom, not just the ratio. |
| Roles | owner, engineering, operations |
Calculation
The card computes a straight ratio on each poll:Threads_connectedis a global status counter (SHOW GLOBAL STATUS LIKE 'Threads_connected'). It includes connections that are actively running a query and connections that are idle in theSleepcommand state. It does NOT include the small number of internal background threads (purge, IO, page cleaner), so it maps cleanly onto application-facing capacity.max_connectionsis a server variable (SHOW GLOBAL VARIABLES LIKE 'max_connections'). On a managed service such as Amazon RDS or Aurora it is frequently set by a formula in the parameter group (for exampleLEAST({DBInstanceClassMemory/12582880}, 5000)), so the effective ceiling moves with the instance class.
max_connections for an account holding the SUPER (or CONNECTION_ADMIN) privilege, so a root session can still get in to investigate at 100%. The card reports saturation against the published max_connections ceiling, which is the number that matters for ordinary application traffic.
The RT/1m window means the alert does not fire on a single transient sample. The threshold is evaluated against a sustained 1-minute reading, which filters out the brief spikes that occur normally during deploys, connection-pool warm-up, or a batch job opening connections in a burst.
Worked example
A platform team runs a primary MySQL 8.0 instance on andb.r6g.2xlarge RDS class backing the order and catalogue services for a mid-size retailer. The parameter group resolves max_connections to 2,000. Snapshot taken on 14 Apr 26 at 19:42 BST, during an evening promotional push.
| Sample time | Threads_connected | max_connections | Saturation % | State |
|---|---|---|---|---|
| 19:38 | 980 | 2,000 | 49% | Healthy |
| 19:40 | 1,510 | 2,000 | 76% | Climbing |
| 19:42 | 1,844 | 2,000 | 92% | Alert |
| 19:43 | 1,961 | 2,000 | 98% | Critical |
SHOW PROCESSLIST to see where the connections are going:
- Reclaim idle connections.
wait_timeoutis set to the default 28,800 seconds (8 hours), so idle pool connections never get reaped. Lowering it to 600 seconds and lowering the application pool’s max-idle setting frees hundreds of slots without touching live traffic. - Kill the offending query, not the pool. The 114 connections stacked on the un-indexed report query are the immediate accelerant. Identify it via Top 10 Slowest Queries (digest),
KILLthe runaway, and route the report to a read replica. - Raise the ceiling only as a last resort.
SET GLOBAL max_connections = 2500buys headroom instantly, but each connection costs memory (per-thread buffers), so raising it on a memory-constrained instance trades a connection error for an out-of-memory kill. Check Memory Usage % before turning this dial.
- Saturation is about headroom, not load. A server at 92% saturation with 1,420 idle connections is not busy, it is leaking slots. Read this card alongside Connections In Use and
SHOW PROCESSLISTto separate “working hard” from “holding slots”. - The cure is usually the application pool, not the database. Most saturation events trace back to an oversized or mis-tuned client-side pool (HikariCP, PgBouncer-style proxies, PHP persistent connections) rather than genuine demand.
- Raising
max_connectionsis a sedative, not a fix. It removes the symptom and adds memory pressure. Treat it as a bridge while you fix the real cause.
Sibling cards
| Card | Why pair it with Connection Pool Saturation % | What the combination tells you |
|---|---|---|
| Connections In Use | The raw Threads_connected number behind the ratio. | High saturation with low active connections equals idle-connection leak, not real demand. |
| Connection Pool at >90% Saturation | The alert-list card that fires off this exact metric. | The gauge shows the level; the alert card shows when and for how long it breached. |
| Aborted Connects (24h) | Counts connections that failed to establish. | Saturation at 100% drives Aborted_connects up as new clients are rejected. |
| Connection Errors (24h) | The error-side view of refused connections. | A spike here during a saturation event confirms Too many connections is hitting the app. |
| Memory Usage % | Each connection consumes per-thread memory. | Check before raising max_connections; high memory plus high saturation is a trap. |
| Queries per Second (live) | Distinguishes genuine traffic from idle-slot hoarding. | Saturation rising while QPS is flat means the pool is leaking, not the workload growing. |
| MySQL Pool Saturation vs Traffic Burst | The cross-channel revenue framing. | Ties a saturation breach to a live traffic burst and the revenue at risk if connections start failing. |
| MySQL Health Score | The composite that weights saturation as an input. | A single sustained saturation breach pulls the health score down. |
Reconciling against the source
Where to look on the instance:To reproduce the card’s exact number at the prompt:SHOW GLOBAL STATUS LIKE 'Threads_connected';for the live numerator.SHOW GLOBAL VARIABLES LIKE 'max_connections';for the denominator.SHOW STATUS LIKE 'Max_used_connections';for the high-water mark since the last restart, which tells you the worst saturation the instance has seen.SELECT * FROM performance_schema.threads;orSHOW PROCESSLIST;to see what each connection is actually doing.
| Service | Where to confirm |
|---|---|
| Amazon RDS / Aurora | The DatabaseConnections CloudWatch metric is the numerator; the parameter group holds the max_connections formula. The RDS Performance Insights “DB Load” view shows connection waits. |
| Google Cloud SQL | The database/mysql/connections Cloud Monitoring metric; max_connections is in the instance flags. |
| Azure Database for MySQL | The active_connections metric in Azure Monitor; the connection limit is tied to the pricing tier and vCore count. |
| Reason | Direction | Why |
|---|---|---|
| Sampling moment | Either way | The gauge is a point-in-time read; SHOW PROCESSLIST run a second later catches a different instant. Connection counts move fast during bursts. |
max_connections changed at runtime | Either way | SET GLOBAL max_connections takes effect immediately. The card re-reads the variable each poll, but a managed-service console may cache the parameter-group value and lag. |
| CloudWatch granularity | Smoother | RDS DatabaseConnections is a 1-minute average by default; a sub-minute spike the card catches will be flattened in CloudWatch. |
| Reserved SUPER slot | Marginal | MySQL allows one connection above max_connections for a SUPER user. The card reports against the published ceiling, so a root session at the ceiling can read 100%+ briefly. |
Known limitations / FAQs
The card shows 92% but the server feels fine. Is this a false alarm? Not a false alarm, an early warning. Saturation measures headroom, not load. A server can sit at 92% with most connections idle and respond instantly to the queries that are running. The risk is not current slowness; it is that the next connection burst (a deploy, a retry storm, a cron fan-out) has nowhere to land and starts gettingERROR 1040. Investigate now while it is cheap, using SHOW PROCESSLIST to see how many connections are in Sleep.
Most of my connections are idle (Sleep state). Should they count?
Yes, because they occupy a slot. An idle connection still consumes a max_connections slot and per-thread memory; it just is not running a query. This is the single most common cause of saturation: an application connection pool that opens connections it never closes. The fix is on the client side (lower the pool’s max-idle and max-lifetime) plus a shorter server-side wait_timeout to reap abandoned connections.
Can I just raise max_connections to make the alert go away?
You can, and sometimes you should as a bridge, but it is not free. Each connection reserves per-thread buffers (read_buffer_size, sort_buffer_size, join_buffer_size, and others). On a memory-constrained instance, doubling max_connections can push the instance into swap or trigger the OOM killer, which is a far worse failure than a connection error. Always check Memory Usage % first, and prefer fixing the client pool.
Does Threads_connected include the replica IO threads?
The connections a replica opens to its source do appear in the source’s Threads_connected (each replica holds one connection for the binlog dump). On the replica side, the internal SQL and IO threads are background threads and are not application connections. For a server with many replicas, account for one source-side slot per replica when sizing max_connections.
What is the difference between this card and Max_used_connections?
This card is live saturation right now. Max_used_connections is the high-water mark since the last restart: the most connections ever held at once. They answer different questions. Use this card for “are we about to run out?” and Max_used_connections for “how close have we ever come?”. If Max_used_connections is at or near max_connections, you have already hit the ceiling at least once and likely dropped connections.
On RDS, the console connection graph does not match the card. Why?
RDS DatabaseConnections in CloudWatch is, by default, a 1-minute average. The card samples in real time. A 15-second spike to 98% that the card catches and alerts on will appear as a much lower average in CloudWatch. For matching, switch the CloudWatch statistic to Maximum over a 1-minute period; that will line up far better with the card’s peak readings.
Will the alert fire on a brief deploy-time spike?
No, that is what the 1m in the RT/1m window prevents. A single transient sample above 90% (common when a fresh app instance warms its pool) does not trip the alert. The threshold is evaluated against a sustained 1-minute reading, so only a genuine, persistent breach pages the on-call.