At a glance
Redis accepts at mostmaxclientssimultaneous connections (default 10000). Once that ceiling is hit, every new connection attempt is refused: the client getsERR max number of clients reachedand Redis bumps itsrejected_connectionscounter. This card watches that counter for any upward movement. A risingrejected_connectionsmeans application servers cannot get a connection to Redis right now, which translates directly to failed reads, failed writes, and errors in front of users. For a platform or SRE team this is a saturation alarm: the instance is full and turning callers away.
| Data source | INFO clients and INFO stats: connected_clients and maxclients for the ceiling, and the cumulative rejected_connections counter for refusals. |
| Metric basis | Movement of the rejected_connections counter, not its absolute value (the counter only resets on restart). Any sustained increase fires. |
| Why the ceiling exists | maxclients protects the instance from running out of file descriptors. Redis reserves ~32 descriptors for internal use, so the effective client limit may be slightly below the configured value if the OS ulimit is lower. |
| Aggregation window | RT (real-time). The card raises the alert as soon as rejected_connections starts climbing between polls. |
| Alert trigger | rejected_connections increasing. The headline shows the count of new refusals since the previous poll plus the current connected_clients against maxclients. |
| What does NOT count | (1) Connections closed normally by the client; (2) connections dropped for idle timeout (timeout config); (3) connections killed by CLIENT KILL; (4) auth failures, those are refused for a different reason and counted elsewhere. Only refusals caused by hitting maxclients increment this counter. |
| Topology scope | Per node. On a cluster each node has its own maxclients and its own counter; the card reads the worst node and can break down per node. |
| Time window | RT (real-time, evaluated on every poll) |
| Alert trigger | rejected_connections increasing |
| Roles | owner, engineering, operations |
Calculation
Redis exposes a monotonic counterrejected_connections in the # Stats section of INFO. The card samples it each poll and watches the delta:
new_rejections > 0 between consecutive polls fires the alert, because in a healthy steady state this counter never moves: an instance that is not at its ceiling never refuses a connection on maxclients grounds. The card also reads connected_clients and maxclients from INFO clients so the headline can show how close the instance is to the cap (connected_clients / maxclients) and confirm that the refusals are saturation, not a transient.
Because the counter resets to zero on restart, a delta across a restart would be negative; the card detects the reset (current < previous) and treats it as no new rejections rather than reporting a nonsensical value.
Worked example
A platform team runs a Redis primary backing session storage and a rate-limiter for a storefront fleet of application servers.maxclients is left at the default 10000. Each app server runs a connection pool of up to 50 connections. Normally 30 app servers are online, so connection use sits around 1500, comfortably under the cap. Snapshot taken on 03 Jun 26 from 12:00 to 12:08 BST during an autoscaling event triggered by a flash sale.
| Time (BST) | App servers | connected_clients | rejected_connections (cumulative) |
|---|---|---|---|
| 12:00 | 30 | 1,490 | 0 |
| 12:03 | 120 (autoscaled) | 6,200 | 0 |
| 12:05 | 200 (autoscaled) | 9,980 | 0 |
| 12:06 | 220 | 10,000 (at cap) | 0 -> 740 |
| 12:08 | 240 | 10,000 (at cap) | 740 -> 3,160 |
maxclients and began refusing every connection beyond the cap.
- App servers cannot reach Redis. Every refused connection is an app server that could not open (or re-open) its pool. Requests on those servers that need a session lookup or a rate-limit check fail or fall back to an error path. During a flash sale this is the worst possible time to start failing.
- The cause is more clients than the cap allows, not a slow Redis. Latency and OPS may look fine; the instance is healthy but full. The mismatch is between the fleet’s total connection demand (240 servers x 50 = up to 12000) and the 10000 ceiling.
- The fix is raising the cap or shrinking pools, both fast.
CONFIG SET maxclients 20000takes effect immediately (provided the OS file-descriptorulimitallows it). Equally, reducing each app server’s max pool size from 50 to 30 brings demand under 7200. The durable fix is sizing pools to the cap on purpose.
- A rejection is a hard failure, not a slowdown. Unlike a slow command, a refused connection gives the client nothing; the request fails outright. That is why any movement on this counter pages immediately rather than waiting for a sustained window.
- Saturation is a sizing problem, not a performance problem. Redis can be perfectly fast and still refuse connections. The fix lives in
maxclients, the OSulimit, and your client pool sizes, not in query tuning. - Watch the OS file-descriptor limit too. Raising
maxclientsabove whatulimit -npermits will not help, Redis caps the effective limit to the descriptors it actually has, reserving ~32 for itself. Amaxclientsbump that does not stop rejections almost always means the OS limit is the real ceiling.
Sibling cards to read alongside this one
| Card | Why pair it with Connections Rejected | What the combination tells you |
|---|---|---|
| Clients vs maxclients % | The saturation gauge behind this alert. | At 100% plus rising rejections equals a confirmed connection-ceiling outage. |
| Connected Clients | The raw count climbing toward the cap. | A steep climb here precedes the first rejection; an early-warning leading indicator. |
| Rejected Connections (24h) | The trended daily total this alert thresholds. | Recurring daily spikes equal a chronic sizing problem, not a one-off. |
| Blocked Clients (BLPOP / BRPOP / WAIT) | Blocked clients hold connection slots open. | Many blocked clients can consume the slot budget and bring on the ceiling faster. |
| Operations per Second (live) | Throughput when callers are being refused. | OPS plateauing while rejections climb confirms the instance is full, not idle. |
| Redis Health Score | The executive composite this alert hits. | Rising rejections drag the health score down sharply; this card is the why. |
Reconciling against the source
Where to look in Redis itself:Why our number may legitimately differ from a raw counter read:INFO clientsshowsconnected_clientsandmaxclients:redis-cli INFO clients. This tells you how close to the ceiling you are right now.INFO statsholds the cumulativerejected_connectionscounter:redis-cli INFO stats | grep rejected_connections.CONFIG GET maxclientsconfirms the configured ceiling, andCONFIG SET maxclients <n>raises it live.CLIENT LISTenumerates every open connection (address, age, idle time, last command) so you can see which app servers or which command types are holding slots.ulimit -non the host (or/proc/<pid>/limits) confirms the OS file-descriptor cap, the true upper bound onmaxclients.
| Reason | Direction | Why |
|---|---|---|
| Delta vs total | We show new rejections; INFO shows cumulative | rejected_connections only grows. Our headline is the increase since the last poll, so it will be smaller than the raw counter. |
| Restart reset | Our delta ignores the reset | The counter resets to 0 on restart; we detect that and report no new rejections rather than a negative number. |
| Effective vs configured cap | Saturation may hit below maxclients | If ulimit -n is below maxclients, Redis refuses connections before reaching the configured number; our saturation gauge reflects the effective limit, not just the config. |
| Per-node view | Cluster totals differ | On a cluster we report the worst node, not the cluster sum; adding every node’s counter exceeds our headline. |
CurrConnections and a per-node connection limit in CloudWatch; Azure Cache for Redis surfaces Connected Clients and a tier-based maximum in Azure Monitor; Redis Cloud shows connection counts and the plan’s connection limit in its console. Managed tiers often set maxclients according to the plan and may not allow CONFIG SET maxclients, in which case the fix is scaling the plan or shrinking client pools. Reconcile our rejection count against the console’s connection-limit metric for the same node and minute.
Known limitations / FAQs
I raisedmaxclients but rejections kept happening. Why?
Almost always the OS file-descriptor limit. Redis cannot accept more connections than it has descriptors, and it reserves about 32 for its own use. If ulimit -n on the host is 10240, setting maxclients 20000 will not help, the effective ceiling is still ~10208. Raise the OS limit (ulimit -n for the process, plus systemd LimitNOFILE or the container’s nofile setting) and then raise maxclients. Check /proc/<pid>/limits to confirm what the running process actually has.
My connection count is well below maxclients but I still saw rejections. How?
Two common causes. First, a brief burst pushed you to the cap momentarily between polls, then connections closed, so the live count looks fine but the counter moved. Second, the OS descriptor limit is below maxclients, so the effective ceiling is lower than the configured one. Use CLIENT LIST during the event and check ulimit -n to tell them apart.
What is the difference between a rejected connection and a dropped connection?
A rejected connection never gets established, Redis refuses it at accept time with ERR max number of clients reached because it is at the ceiling. A dropped connection was established and then closed later, by the client, by an idle timeout, or by CLIENT KILL. Only ceiling refusals increment rejected_connections; drops do not. This card is specifically the saturation signal.
Should I just set maxclients very high to be safe?
Not blindly. Each connection costs memory (an output buffer and bookkeeping) and a file descriptor. Setting maxclients far above what your host can support means Redis advertises a ceiling it cannot honour, and you hit the OS limit instead, with worse error behaviour. Size maxclients to comfortably exceed your fleet’s peak pool demand, ensure ulimit -n exceeds maxclients + 32, and leave headroom for autoscaling, rather than picking an arbitrarily large number.
My connection pool churns constantly. Could that cause rejections without my fleet actually being large?
Yes. A pool that opens and closes connections rapidly (or short-lived clients that do not pool at all) can keep many half-closed connections in TIME_WAIT and momentarily exceed the cap even though steady-state demand is modest. The fix is persistent pooling: open a bounded pool once and reuse it, rather than connecting per request. CLIENT LIST showing thousands of very young connections is the tell.
On a cluster, only one node rejects connections. Why just one?
Connection demand is not evenly spread across a cluster. Clients connect to the node that owns the slots they need, so a hot key prefix or an uneven client-side routing can concentrate connections on one node while others sit idle. The card reports the worst node so the hot one surfaces. The fix is either raising that node’s maxclients, smoothing the key distribution, or balancing client connections across nodes.
Does an idle-connection timeout help here?
It can. Setting the timeout config closes connections that have been idle for that many seconds, freeing slots held by app servers that opened a connection and went quiet. This is useful when a large fleet keeps many mostly-idle connections open. Be careful with it for clients that rely on long-lived connections (pub/sub subscribers, blocking consumers); a too-aggressive timeout will disconnect them mid-wait.