At a glance
Keyspace Hit Rate % is the share of key lookups that Redis served from memory rather than missing. It is computed askeyspace_hits / (keyspace_hits + keyspace_misses)fromINFO stats. This is the Redis-defining metric: a cache exists to answer “do you have this?” with “yes” as often as possible. A high hit rate means Redis is absorbing read load that would otherwise fall on your primary database; a falling hit rate means more of that load is leaking through to the backend, which is where latency spikes and database overload usually begin.
| Data source | keyspace_hits / (keyspace_hits + keyspace_misses) from INFO stats. Both are cumulative counters since restart. The card differences consecutive polls so the displayed rate reflects the current window, not the all-time average since boot. |
| Metric basis | Ratio (percentage), derived from two cumulative counters. The Redis-defining metric. |
| Aggregation window | RT/1h: a live real-time gauge plus a one-hour trend so a momentary dip is visible separately from a sustained decline. |
| What is a “hit” | A read command (GET, HGET, MGET, EXISTS, etc.) that found the requested key in the keyspace. keyspace_hits increments. |
| What is a “miss” | A read command for a key that does not exist (never written, already expired, or evicted). keyspace_misses increments. |
| What does NOT count | (1) Write commands (SET, DEL); (2) Administrative commands (INFO, CONFIG); (3) SCAN/KEYS iteration. Only key-level read lookups move the counters. |
| Counter window | Because the underlying counters are cumulative since restart, the lifetime ratio is heavily smoothed; the card reports the per-window rate so a fresh problem is not hidden behind months of good history. |
| Time window | RT/1h (live gauge plus a one-hour trend) |
| Alert trigger | <80%. Below 80%, more than one in five lookups is missing and leaking to the backend; the card turns amber/red and warns the on-call. |
| Roles | owner, platform, sre, dba |
Calculation
The card reads two fields fromINFO stats and computes the ratio over the current window:
Worked example
A platform team uses Redis 7.2 to cache product and pricing lookups in front of MySQL. Healthy steady state is around 96%. Snapshot over one hour on 03 Jun 26 after a catalogue re-import.| Time (BST) | hits delta | misses delta | Hit rate | Note |
|---|---|---|---|---|
| 10:00 | 1,420,000 | 58,000 | 96.1% | Steady state |
| 10:30 | 1,390,000 | 61,000 | 95.8% | Calm |
| 11:02 | 980,000 | 410,000 | 70.5% | Re-import flushed cached product keys |
| 11:10 | 1,050,000 | 360,000 | 74.5% | Cache re-warming, still leaking to MySQL |
| 11:30 | 1,330,000 | 120,000 | 91.7% | Most hot keys back in cache |
| 12:00 | 1,410,000 | 60,000 | 95.9% | Recovered |
- A hit-rate dip after a deploy or import is usually a cold cache, not a broken Redis. The metric is doing its job: it is telling you the backend is suddenly carrying read load it normally would not. The action is to warm the cache, not to resize Redis.
- The 80% line is a leak threshold, not a quality grade. At 96% the backend handles 4% of reads; at 80% it handles 20%, a 5x increase in backend read load. That non-linear effect on the database is why the alert sits at 80 and not lower.
- Distinguish a dip from a decline. A sharp dip that recovers within the hour is a cache-warming event. A hit rate that has drifted from 96% to 88% to 82% over days is a structural problem: the working set is outgrowing memory and eviction is dropping hot keys (pair with Evicted Keys / minute and Memory Used vs Maxmemory %).
Sibling cards
| Card | Why pair it with Keyspace Hit Rate % | What the combination tells you |
|---|---|---|
| Evicted Keys / minute | Eviction of hot keys is the leading cause of a structural hit-rate decline. | Hit rate down plus eviction up equals an undersized cache dropping wanted data. |
| Memory Used vs Maxmemory % | Memory pressure forces the eviction that lowers hit rate. | Hit rate down plus memory over 90% equals the eviction spiral. |
| Expired Keys / minute | TTL churn turns into misses when expired keys are re-read. | A jump in expiry followed by a hit-rate dip equals TTLs set too short for the read pattern. |
| Total Keys (db0) | The keyspace size context for a hit-rate move. | Hit rate down while key count falls equals a flush/invalidation; down with key count flat equals a read-pattern change. |
| Operations per Second (live) | The throughput context for the miss rate. | Hit rate down with ops up equals more cold traffic; down with ops flat equals the same traffic now missing. |
| Command Latency p95 (ms) | Misses themselves are cheap in Redis; the cost is downstream. | A hit-rate dip with a backend latency spike confirms misses are leaking to the database. |
| Redis Health Score | The composite that weights hit rate at 20%. | A sustained hit-rate decline is one of the biggest single contributors to a falling score. |
| Redis OPS Spike vs Ecom Order Rate | Separates genuine demand from a stampede. | A hit-rate collapse with an ops spike but flat orders equals a cache stampede or bot. |
Reconciling against the source
Where to look in Redis:For ElastiCache or MemoryDB, the same fields are available viaINFO statsand readkeyspace_hitsandkeyspace_misses:redis-cli INFO stats | grep -E 'keyspace_hits|keyspace_misses'. Difference two readings to get the windowed rate the card reports.INFO keyspacefor per-database key counts, useful when a hit-rate drop coincides with a key-count drop (an invalidation).redis-cli --statfor a live one-line throughput view including hits and misses per second.CONFIG RESETSTATresets the cumulative counters to zero; useful before a controlled test, but note it also resets every otherINFO statsfigure.
INFO; the CloudWatch metrics CacheHits and CacheMisses are the managed-service equivalents and let you compute the same ratio at one-minute granularity.
Why our number may legitimately differ from a manual reading:
| Reason | Direction | Why |
|---|---|---|
Lifetime vs window. A manual INFO shows the cumulative-since-restart ratio; the card shows the current window. | Manual often higher | Months of good history smooth out a fresh dip in the lifetime figure. |
Counter reset. CONFIG RESETSTAT or a restart zeroes the counters. | Card rebaselines | The card detects the reset and resumes from the new baseline; a naive manual delta would read oddly across the reset. |
| Poll spacing. The windowed rate depends on poll cadence. | Either | A short dip between polls is averaged across the interval. |
CloudWatch granularity (managed). CacheHits/CacheMisses are one-minute aggregates. | Marginal | The native counters are exact; CloudWatch is binned. |
Known limitations / FAQs
My hit rate is 99% but the application still feels slow. How? Hit rate measures whether Redis had the key, not how fast it returned it. A 99% hit rate with high Command Latency p95 (ms) usually means the values themselves are large (multi-megabyte hashes, big sorted sets) or a slow Lua script is in the path. The cache is hitting; the commands are just expensive. CheckSLOWLOG GET and key sizes with MEMORY USAGE <key>.
Is a low hit rate always bad?
No. Some workloads are legitimately low-hit by design: a deduplication set, a one-shot idempotency-key check, or a rate limiter where most keys are first-seen. For those, a “miss” is the expected and correct answer. The 80% alert assumes a read-through cache; if your Redis is not a read cache, raise or disable the threshold in the Sensitivity tab so a healthy low-hit workload does not page you.
My hit rate dropped right after a deploy. Is Redis broken?
Almost certainly not. The most common cause is a cache invalidation: the deploy changed a key format, flushed a namespace, or re-imported data, so every read missed until the cache re-warmed. It recovers as hot keys come back. To avoid the dip entirely, pre-warm the cache or use versioned key prefixes so old keys keep serving until new ones exist.
The lifetime hit rate in redis-cli INFO is higher than the card. Which is right?
Both, they measure different windows. INFO shows the cumulative ratio since the instance started, which can be smoothed by months of good history. The card shows the current window so a fresh problem is visible. For “is the cache healthy right now”, trust the card; for “how has it done overall since boot”, read the raw INFO totals.
Counters look stuck at the same numbers. Why?
Either the instance has no read traffic in the window (check Operations per Second (live)), or you are reading a replica that serves no reads, or CONFIG RESETSTAT was run and the deltas are starting from zero again. The counters only move on key-level read commands; pure write traffic does not change them.
Does a MGET for 100 keys count as one hit or many?
The counters increment per key looked up, not per command. An MGET of 100 keys where 90 exist records 90 hits and 10 misses. This is why batch-read patterns move the counters in large steps and why the ratio is robust even though one command touched many keys.
Can I improve hit rate by raising maxmemory?
If the decline is caused by eviction of hot keys, yes: more memory means fewer evictions means more hits. Confirm first with Evicted Keys / minute, if eviction is zero, the misses are coming from cold keys or short TTLs, and more memory will not help. In that case the lever is longer TTLs, cache pre-warming, or a better key-coverage strategy, not a bigger node.