At a glance
Connection-pool saturation expressed as a percentage:connected_clients / maxclients. When that ratio reaches 100% Redis stops accepting new connections and returnsERR max number of clients reached, which immediately drops every downstream service that needs a fresh connection (a restarting app pod, a cron worker, a new web node scaling in). For a platform team this is “how close am I to the moment Redis refuses to talk to anyone new?” A healthy steady state sits well under 50%; sustained readings above 90% mean a connection leak, an undersizedmaxclients, or a thundering herd of short-lived clients.
| What it tracks | The live ratio of open client connections to the configured connection ceiling. connected_clients and maxclients both come from INFO clients. Rendered as a gauge from 0 to 100%. |
| Data source | Redis INFO clients section: connected_clients (current open connections) divided by the maxclients config value (CONFIG GET maxclients). The detail line: connected_clients / maxclients. Hitting cap rejects new connections, drops downstream services. |
| Time window | RT/1m (real-time, sampled and smoothed over a rolling 1-minute window so a single noisy poll does not flip the gauge). |
| Alert trigger | > 90%. At 90% the gauge turns red and pages the on-call: you are one traffic burst away from rejected connections. |
| Roles | owner, engineering, operations |
Calculation
The card runsINFO clients and reads connected_clients, then reads the ceiling from CONFIG GET maxclients (cached and re-read whenever the config changes). The gauge value is:
maxclientsis not always what you set. Redis reserves around 32 file descriptors for its own use (cluster bus, replicas, persistence). If the OSulimit -nis lower thanmaxclients + 32, Redis silently lowers the effectivemaxclientsat start-up and logs a warning. The card reads the effective value Redis reports throughCONFIG GET maxclients, not the value inredis.conf, so the percentage reflects reality.- Replica and cluster-bus connections count toward
connected_clientson some versions. The engine uses the headlineconnected_clientsfromINFO clientsas Redis reports it; for cluster nodes this can include a handful of internal links, which is why the denominator matters more than a couple of connections. - Managed services cap
maxclientsfor you. On ElastiCache and MemoryDB themaxclientsvalue is fixed per node type (for example 65,000 on most node sizes) and cannot be raised throughCONFIG SET. The card still reads the live effective value so the gauge is accurate even when you cannot change the ceiling.
Worked example
A platform team runs a single-primary Redis 7.2 instance on anr6g.large ElastiCache node backing session storage and a job queue for a high-traffic storefront. maxclients is fixed at 65,000 by the node type. Snapshot taken on 14 Apr 26 at 20:05 BST during an evening traffic peak.
| Reading | Value |
|---|---|
connected_clients | 9,100 |
effective maxclients | 65,000 |
| Gauge | 14% |
| Trend over prior hour | flat around 8,800 to 9,200 |
r6g.medium cache node fronting product pages, where the application uses a per-request connection pattern instead of a pool:
| Reading | Value |
|---|---|
connected_clients | 58,900 |
effective maxclients | 65,000 |
| Gauge | 91% |
| Trend over prior hour | climbing from 41,000 |
- Is the denominator wrong (too-small
maxclients)? No, 65,000 is the node ceiling. - Is this real demand or a leak? The climb from 41,000 to 58,900 in one hour with no matching traffic increase is the tell. A connection leak: the application opens a connection per request and never returns it to a pool, so connections accumulate until idle ones are reaped (or never are, if
timeout 0is set). - What happens at 100%? New web nodes scaling in for the peak cannot connect and crash-loop, which paradoxically makes the team scale out further, opening even more connections and accelerating the climb.
- The percentage hides the headroom in connection count. 91% of 65,000 still leaves 6,100 connections, but at a 300/min climb that headroom is 20 minutes, not comfort. Always read the gauge and the slope.
- The fix is almost always client-side. Raising
maxclientstreats the symptom. A connection pool or a non-zero idletimeouttreats the cause. Pair with Connected Clients to watch the raw count after the fix. - Rejected connections are the lagging confirmation. Once you cross 100%, Rejected Connections (24h) starts incrementing. If this gauge is green but rejections are non-zero, you had a transient spike that the 1-minute smoothing missed.
Sibling cards
| Card | Why pair it with Clients vs maxclients | What the combination tells you |
|---|---|---|
| Connected Clients | The raw numerator without the ratio. | The gauge gives proximity to the cap; this gives the absolute count for trending and leak detection. |
| Rejected Connections (24h) | The lagging confirmation that you crossed 100%. | Gauge under 90% but rejections non-zero equals a transient burst smoothing missed. |
| Connections Rejected Due to maxclients | The real-time alert version of rejections. | Gauge climbing plus this alert firing equals the cap has been hit right now. |
| Blocked Clients (BLPOP / BRPOP / WAIT) | Blocked clients still hold a connection slot. | A queue consumer storm inflates both blocked clients and total connections at once. |
| Operations per Second (live) | Throughput context for the connection count. | Many connections but flat ops equals idle/leaked connections, not real load. |
| Connected Clients Saturation vs Traffic Burst | The cross-channel view tying saturation to storefront traffic. | Confirms whether the climb is genuine demand or a leak unrelated to traffic. |
| Redis Health Score | The composite that weights pool saturation. | A 90%+ gauge alone can pull the composite below its threshold. |
Reconciling against the source
Where to look in Redis’s own tooling:For managed services:redis-cli INFO clientsreturnsconnected_clients,blocked_clients, andcluster_connections. This is the numerator straight from the source.redis-cli CONFIG GET maxclientsreturns the effective ceiling (the denominator). Compare it against the value inredis.conf; if they differ, the OS file-descriptor limit clipped it.redis-cli CLIENT LISTenumerates every open connection with its idle time, address, and last command, the definitive way to find a leak (look for thousands of connections with growingidleandcmd=NULL).redis-cli INFO statsexposestotal_connections_receivedandrejected_connectionsfor the historical view.
ElastiCache / MemoryDB: CloudWatch metricWhy our number may legitimately differ:CurrConnectionsis the numerator; themaxclientsceiling is fixed per node type and documented in the AWS node-type reference. Divide to reproduce the gauge. Azure Cache for Redis: theConnected Clientsmetric in Azure Monitor; the connection limit is tier-dependent. Redis Cloud (Redis Enterprise): theconnsmetric in the database metrics view; the limit is set per database subscription.
| Reason | Direction | Why |
|---|---|---|
| 1-minute smoothing | Gauge lower than a raw poll | A momentary spike is averaged out; CLIENT LIST run at the peak instant shows more. |
Effective vs configured maxclients | Denominator differs | We read the effective value Redis reports; redis.conf may say 100,000 while the OS clipped it to 10,000. |
| Internal connections | Numerator slightly higher | Replica links and the cluster bus can count toward connected_clients on some node roles. |
| CloudWatch granularity | Cross-tool variance | CurrConnections is a 1-minute datapoint on ElastiCache; the gauge polls more often. |
Known limitations / FAQs
The gauge says 91% but I have plenty of memory and CPU. Why is this a problem? Connection saturation is independent of memory and CPU pressure. Redis can be almost idle on commands yet still refuse new connections because the slots are full. The danger is operational, not throughput: the next pod, cron job, or scaled-in node cannot connect at all. Treat a sustained 90%+ as urgent regardless of how quiet the instance feels. Should I just raisemaxclients to make the alert go away?
On self-hosted Redis you can (CONFIG SET maxclients 100000), but only if the OS file-descriptor limit allows it, raise ulimit -n first or Redis will silently clip the value. On managed services (ElastiCache, MemoryDB, Azure) the ceiling is fixed by node type and you cannot raise it; the fix is to use a connection pool or a larger node. Either way, raising the cap is a stopgap; a leak will refill any headroom you add.
What is the difference between this card and Rejected Connections?
This card is the leading indicator (proximity to the cap, in real time). Rejected Connections is the lagging confirmation (you actually hit the cap and turned someone away). You want to act on this gauge before the rejection counter ever moves.
Why might the gauge sit at a non-zero floor even with no real traffic?
Replicas, Sentinel connections, monitoring agents (your APM, this connector itself), and the cluster bus all hold connections. A baseline of a few dozen is normal. Use CLIENT LIST to see who they are if the floor looks high.
Does timeout 0 in my config matter here?
Yes, a great deal. With timeout 0 Redis never closes an idle client connection, so a leaky application accumulates connections forever until it hits maxclients. Setting a sensible timeout (for example 300 seconds) lets Redis reap abandoned connections and is the single most effective guard against slow connection leaks.
My cluster has six nodes. Is the gauge per-node or fleet-wide?
Per node. Each node has its own maxclients and its own connected_clients. A hot node (one owning a popular slot range) can saturate while the rest sit idle. Read the gauge per instance, and pair with Cluster Slots Assigned (of 16384) to understand slot distribution.