At a glance
Connected Replicas is the number of replica nodes currently attached to and streaming from the primary, read fromconnected_slavesinINFO replication. It is a small number with outsized importance: it is your live answer to “if this primary dies right now, is there a copy ready to take over?” A value of zero means there is no automatic failover target. Whatever the topology was designed to be, a drop below the expected replica count is the early warning that your redundancy has quietly evaporated, usually well before anything visibly breaks.
| What it tracks | The count of replicas currently connected to the primary and receiving the replication stream (connected_slaves). |
| Data source | INFO replication → connected_slaves, read from the primary node. The same section lists each replica’s IP, port, state, and offset. |
| Time window | RT (real-time snapshot). |
| Alert trigger | < 1 (no failover available). A primary with zero connected replicas has no hot standby; an unplanned failure means downtime and potential data loss back to the last backup. |
| Why it matters | Replicas serve two jobs: they are the failover target (Sentinel or the managed control plane promotes one when the primary dies) and they offload read traffic. Lose them and you lose both, silently. |
| What does NOT count | A replica that is configured but disconnected (network partition, auth failure, or still doing its initial full sync) does not count until it reaches the online state. |
| Roles | engineering, operations |
Calculation
The card readsconnected_slaves from the Replication section of INFO on the primary. Redis only increments this count for replicas that have completed their handshake and are in the connected, streaming state, so a replica mid-resync or one that has dropped off the network is excluded automatically. The detail behind this card is precise: the figure comes from connected_slaves in INFO replication, and the alert fires below 1 because a primary with no connected replica has no failover target at all. The same INFO replication block also exposes each replica line (slave0, slave1, and so on) with its state, offset, and lag, which is what the related Replica Lag (seconds) card reads to judge whether the connected replicas are actually keeping up rather than merely attached.
Worked example
A platform team runs a Redis 7.2 primary with two replicas behind Sentinel, the standard “one to promote, one to spare” pattern for a session and cache tier. The expected reading is 2. Snapshot taken on 03 Jun 26 at 16:40 BST.| Time | Connected Replicas | Replica Lag | What happened |
|---|---|---|---|
| 16:00 | 2 | 0s / 0s | Normal, full redundancy |
| 16:35 | 1 | 0s | replica-2 dropped off |
| 16:40 | 1 | 0s | Still 1, alert active |
Sibling cards
| Card | Why pair it with Connected Replicas | What the combination tells you |
|---|---|---|
| Replica Lag (seconds) | Connected is necessary but not sufficient. | A replica can be connected yet lagging badly; both cards green means the standby is genuinely ready to promote. |
| Cluster Slots Assigned (of 16384) | Cluster-mode redundancy peer. | In cluster mode, replica health per shard plus full slot coverage is the complete availability picture. |
| Cluster Slot Coverage Gap | The cluster-level failure this prevents. | A primary failing with no replica creates a slot coverage gap; healthy replicas avoid it. |
| Last Successful Backup (hours ago) | The fallback when replicas are gone. | Zero replicas plus a stale backup is the worst redundancy posture: no hot standby and a lossy restore. |
| Last RDB Save (minutes ago) | Recovery point if failover is impossible. | If you must restart the primary from disk, this tells you how much data the restart would lose. |
| Memory Used vs Maxmemory % | A common cause of replica drops. | A replica that OOMs during resync drops off; high memory on replicas predicts the drop. |
| Redis Health Score | The composite. | A replica drop pulls the health score down via its availability weighting. |
Reconciling against the source
Where to look in Redis’s own tooling:On a managed service, cross-check the console: ElastiCache and MemoryDB show the node group (shard) membership and each replica’s status in the cluster view, and CloudWatch exposes replication metrics. Note the managed control plane handles failover itself, so the console’s notion of “replicas” is the authoritative redundancy view there, and a healthy console with a transiently oddINFO replicationon the primary: readconnected_slaves, then the per-replica lines (slave0:ip=...,port=...,state=online,offset=...,lag=...).role:masterconfirms you are querying the right node.ROLEfor a compact one-line view of role plus the list of connected replicas and their offsets. If you run Sentinel,SENTINEL replicas <master-name>lists every replica Sentinel knows about and its flags, which is useful when a replica is up but Sentinel has marked its_down(subjectively down).CLUSTER NODES(cluster mode) shows the master-replica relationships across the whole cluster, with each replica’s master ID.
INFO reading usually means a node is mid-recovery.
Why our number may legitimately differ:
| Reason | Direction | Why |
|---|---|---|
| Resync in progress | Vortex IQ lower | A replica doing its initial full sync is not yet online, so connected_slaves excludes it until the sync completes; the console may already list it as a member. |
| Which node is queried | Either | connected_slaves is meaningful only on the primary. Query a replica and you get its own (usually zero) downstream replica count, not the topology total. |
| Sentinel vs INFO | Either | Sentinel may flag a replica down before INFO drops it, or vice versa, during a partition; the two converge once the partition clears. |
| Chained replication | Either | If replicas replicate from other replicas, the primary’s connected_slaves counts only its direct downstream, not the full tree. |
Known limitations / FAQs
The card shows 0 but I definitely configured a replica. Where is it? Three usual causes. (1) The replica is still doing its initial full sync and has not reachedonline state, so it is not counted yet; check the replica’s log for MASTER <-> REPLICA sync progress. (2) The replica cannot authenticate (wrong masterauth after a password rotation); its log will show auth failures. (3) A network or security-group rule is blocking the replication port. Run INFO replication on the replica itself to see what it thinks its master_link_status is.
Connected Replicas is healthy but should I still worry?
Possibly, because “connected” does not mean “in sync”. A replica can be attached but lagging by minutes if it cannot keep up with the write rate or is starved of resources. Always read this card alongside Replica Lag (seconds): a connected-but-lagging replica will promote with stale data, which can be worse than an obvious failure.
Why does the alert fire at below 1 rather than below my designed count?
The shipped threshold treats zero as the universal danger line because zero replicas means no automatic failover under any topology. If your design calls for two or three replicas, set a tighter sensitivity threshold per profile so the card alerts when you drop below your own redundancy target, not just when you hit zero. Catching the 2-to-1 transition is usually where the real early warning lives.
Does this work the same way on a managed service like ElastiCache?
The metric still comes from INFO replication if you query the node directly, but on a managed service the control plane owns failover, so the console’s node-group view is the authoritative redundancy picture. During a managed failover or scaling event, connected_slaves can read oddly for a short window while nodes are added or promoted; let the console settle before treating a transient reading as a real drop.
A replica dropped and came back on its own. Was that a problem?
A brief drop with automatic reconnection is often a partial resync after a short network blip, which is normal and harmless. The concern is a replica that drops and then loops on full resyncs (each one reloads the whole dataset), which hammers the primary and can OOM the replica. If you see repeated drops, check the replica’s maxmemory and the primary’s repl-backlog-size: too small a backlog forces full resyncs instead of cheap partial ones.
In cluster mode, what does this card mean per shard?
Each shard (hash-slot range) has its own primary and its own replicas, so connected-replica health is a per-shard property. One shard losing its only replica is a single point of failure for that slice of the keyspace even if every other shard is fully redundant. Read this card together with Cluster Slots Assigned (of 16384) to confirm both that slots are covered and that the primary owning each slot range has a standby.