At a glance
Redis evicts keys when it runs out of room: onceused_memoryhitsmaxmemory, the configured eviction policy (typicallyallkeys-lruorvolatile-lru) starts deleting keys to make space for new writes. A trickle of evictions is normal on a cache. A storm, more than 1000 evicted keys per minute sustained, means Redis is shedding data faster than your application expects, and the symptoms downstream are cache misses, recomputation, and database load. For a platform or SRE team this card is the early-warning siren that an instance is memory-bound and the cache hit rate is about to collapse.
| Data source | INFO stats, the evicted_keys cumulative counter, sampled each poll. The card computes a per-minute delta and watches the sustained rate. |
| Metric basis | Rate of change of evicted_keys, not its absolute value (the counter only resets on restart). A storm is a high derivative held over the window, not a single spike. |
| What triggers eviction | used_memory reaching maxmemory under a non-noeviction policy. With noeviction set, Redis returns OOM errors on writes instead of evicting, which this card will not see, watch Memory Used vs Maxmemory % for that case. |
| Aggregation window | 5m. The rate must stay above threshold across the rolling 5-minute window to fire, so a one-off burst from a bulk load does not page anyone. |
| Alert trigger | evicted_keys/min > 1000 sustained. The headline shows the current per-minute rate and the active eviction policy. |
| What does NOT count | (1) expired_keys, keys removed because their TTL elapsed, that is healthy housekeeping, tracked separately on Expired Keys / minute; (2) explicit DEL/UNLINK by the application; (3) a single bulk import that briefly spikes evictions then settles. |
| Topology scope | Per primary node. On a cluster each shard has its own maxmemory and its own eviction counter; the card reads the worst-offending shard and can break down per node. |
| Time window | 5m (sustained rate over a rolling 5-minute window) |
| Alert trigger | evicted_keys/min > 1000 |
| Roles | owner, engineering, operations |
Calculation
Redis exposes a monotonic counterevicted_keys in the # Stats section of INFO. The card samples it on each poll and derives a per-minute rate from consecutive samples:
maxmemory_policy from INFO memory so the on-call engineer immediately knows which policy is doing the evicting (allkeys-lru, allkeys-lfu, volatile-lru, volatile-ttl, allkeys-random, volatile-random).
Because the counter resets to zero on restart, a delta computed across a restart boundary would be negative; the card detects the reset (current < previous) and skips that interval rather than reporting a nonsensical rate.
Worked example
A platform team runs a 4 GB Redis primary as a read-through cache for product detail and pricing, withmaxmemory 4gb and maxmemory-policy allkeys-lru. Normal eviction rate sits around 50 to 120 keys/min as cold keys age out. Snapshot taken on 22 May 26 from 19:40 to 19:55 BST during an evening traffic ramp plus a marketing email send.
| Time (BST) | used_memory | evicted_keys/min | Keyspace hit rate |
|---|---|---|---|
| 19:40 | 3.78 GB | 95 | 97.1% |
| 19:45 | 3.96 GB | 410 | 95.8% |
| 19:50 | 4.00 GB (at cap) | 1,840 | 91.2% |
| 19:55 | 4.00 GB (at cap) | 2,310 | 88.4% |
allkeys-lru began evicting aggressively to admit the new working set. The eviction rate crossed 1000/min and stayed there.
- The working set has outgrown the cache. At the cap with sustained eviction, Redis is constantly throwing out keys that are about to be requested again. This is a thrashing pattern: evict a key, get a miss seconds later, refetch from the database, re-cache it, evict something else. The hit rate falling from 97% to 88% is the visible cost.
- The database is absorbing the misses. Every percentage point of hit-rate loss on a busy cache can mean thousands of extra queries per minute hitting the primary database. A storm here often shows up as elevated DB CPU and slow queries minutes later.
- The fix is capacity or TTL, not a Redis restart. Restarting would only clear the counter and reset the symptom. The durable fixes are: raise
maxmemory(or scale the node/shard), shorten TTLs on low-value keys so they expire before they have to be evicted, or move large rarely-read values out of the cache.
- Eviction is not the disease, it is the fever. The storm tells you memory pressure has hit the cap; the cure is more room or a smaller working set, not silencing the alert.
- Read evictions with hit rate, always. Evictions only hurt when they cause misses. Pair this card with Keyspace Hit Rate %: a storm with a stable hit rate means you are evicting genuinely cold keys (fine); a storm with a falling hit rate means thrashing (act now).
- Distinguish evicted from expired. A high
expired_keysrate is healthy TTL housekeeping; a highevicted_keysrate is memory pressure. They look similar on a key-count chart but mean opposite things.
Sibling cards to read alongside this one
| Card | Why pair it with Eviction Storm | What the combination tells you |
|---|---|---|
| Memory Used vs Maxmemory % | The cause: evictions start when this hits 100%. | At the cap plus a storm equals a memory-bound instance needing headroom. |
| Keyspace Hit Rate % | Tells you whether the storm actually hurts. | Storm with falling hit rate equals thrashing; storm with stable hit rate equals evicting cold keys. |
| Evicted Keys / minute | The continuous gauge this alert thresholds. | Same evicted_keys counter; this card is the sustained-rate alarm. |
| Expired Keys / minute | The healthy cousin to rule out confusion. | High expired plus low evicted equals normal TTL churn, not pressure. |
| Total Keys (db0) | Key count falling during a storm confirms shedding. | A dropping keyspace size during the window confirms eviction, not just slow growth. |
| Memory Fragmentation Ratio | Fragmentation inflates used_memory toward the cap. | High fragmentation can trigger evictions before the real dataset fills memory. |
Reconciling against the source
Where to look in Redis itself:Why our number may legitimately differ from a raw counter read:INFO statsreports the cumulativeevicted_keyscounter. Sample it twice a minute apart and divide to get the rate:redis-cli INFO stats | grep evicted_keys.INFO memoryconfirmsmaxmemory,used_memory, and the activemaxmemory_policyso you know the cap and the eviction strategy.redis-cli --bigkeysscans for the largest keys, the usual culprits behind sudden memory pressure.MEMORY STATSandMEMORY USAGE <key>break down where the memory is going at the data-structure level.
| Reason | Direction | Why |
|---|---|---|
| Rate vs total | We show per-minute; INFO shows cumulative | evicted_keys only ever grows. Our card differentiates it into a rate, so a casual look at the raw counter will not match our headline. |
| Restart reset | Our rate skips one interval | The counter resets to 0 on restart. We detect the reset and skip that sample rather than report a negative rate. |
| Window smoothing | Our number lags a raw spike | The card requires the rate to hold across a 5-minute window, so a momentary burst you see in a live INFO loop may not yet show as a storm. |
| Per-shard view | Cluster totals differ | On a cluster we report the worst shard, not the cluster sum; adding every shard’s counter will exceed our headline. |
Evictions CloudWatch metric per node; Azure Cache for Redis exposes Evicted Keys in Azure Monitor; Redis Cloud shows evictions in its metrics panel. Reconcile our per-minute rate against those: CloudWatch’s Evictions is typically a per-minute sum already, so it should align closely with our headline for the same node. If they diverge, check that you are comparing the same shard and the same minute boundary.
Known limitations / FAQs
My eviction rate is high but my hit rate is fine. Should I worry? Less so. A high eviction rate with a stable, high hit rate means Redis is correctly evicting genuinely cold keys to admit a hot working set, the cache is doing its job. The dangerous pattern is a high eviction rate alongside a falling hit rate, which signals thrashing: you are evicting keys you are about to need again. Always read this card next to Keyspace Hit Rate %. What is the difference between evicted and expired keys? Expired keys are removed because their TTL elapsed, this is intentional, healthy housekeeping. Evicted keys are removed under memory pressure because Redis hitmaxmemory, regardless of whether they still had time to live. A storm of expirations is normal; a storm of evictions means you are out of room. They are reported as separate counters and on separate cards.
I set maxmemory-policy noeviction. Why does this card stay quiet during memory pressure?
With noeviction, Redis does not evict at all, it refuses write commands with an OOM command not allowed error once at the cap. So evicted_keys stays flat and this card stays silent even though the instance is in trouble. For noeviction setups, monitor Memory Used vs Maxmemory % and the error cards instead; a quiet eviction card is not the same as a healthy instance.
A bulk import spiked evictions for thirty seconds but no alert fired. Why?
By design. The card requires the rate to stay above 1000/min across a rolling 5-minute window, so a short burst from a one-off bulk load or cache warm-up is filtered out. This prevents routine maintenance from paging the on-call. A genuine storm sustains the rate; a bulk job tails off.
Will raising maxmemory stop the storm permanently?
It stops it until the working set grows to fill the new ceiling. Raising maxmemory buys headroom and is the right immediate mitigation, but if the working set keeps growing you will be back at the cap. The durable fixes are shorter TTLs on low-value keys, moving large blobs out of the cache, and right-sizing the node or adding shards. Treat a capacity bump as breathing room, not a cure.
Does an eviction storm cause data loss?
For a pure cache, no, evicted keys can be refetched from the source of truth (slowly, hence the database load). But if you store anything in Redis that is not backed elsewhere (sessions, rate-limit counters, queues) under an allkeys-* policy, those can be evicted and genuinely lost. Use volatile-* policies and set TTLs only on disposable keys, so non-disposable data is never an eviction candidate.
On a cluster, the per-shard rates look uneven. Is that a problem?
Often, yes, it points to a hot shard. If one shard evicts heavily while others are calm, your key distribution is skewed (a few hot key prefixes hashing to the same slots, or a large key on one shard). Use Cluster Slots Assigned (of 16384) and redis-cli --bigkeys per node to find the imbalance. The card reports the worst shard so the hot one surfaces first.