At a glance
This card plots Redis command throughput (instantaneous_ops_per_sec) on one axis against the storefront order rate on the other, over the same minutes. In a healthy store the two lines move together: more shoppers means more orders means more Redis operations. The dangerous pattern is divergence. When Redis operations spike but orders stay flat, work is being done that produces no revenue. That is the fingerprint of a cache stampede (many workers recomputing the same expired key at once) or a bot crawl hammering the cache. The card surfaces that gap while it is open so you can act before it tips into a latency incident.
| What it tracks | Two synchronised series: Redis operations per second and the linked storefront’s order rate, plotted on a dual axis so divergence is visible at a glance. |
| Data source | Redis side: instantaneous_ops_per_sec from INFO stats. Ecommerce side: the live order rate from the linked Shopify, BigCommerce or Adobe Commerce connector, windowed to the same minutes. |
| Time window | 15m rolling. |
| Alert trigger | ops spike with no order spike (a Redis throughput surge with no matching rise in orders), which points to a cache stampede or bot traffic. |
| Roles | owner, engineering, operations |
Calculation
The card samplesinstantaneous_ops_per_sec from INFO stats (Redis’s own rolling estimate of commands per second) and the storefront order rate from the linked connector at the same cadence across the 15-minute window. It then looks for divergence: a statistically meaningful spike in operations that is not accompanied by a comparable spike in orders. The alert fires on that asymmetry, not on high ops alone. High ops during a genuine sales surge is exactly what you want to see; high ops with flat orders is wasted or hostile work. The join is what makes the card actionable: by itself, instantaneous_ops_per_sec cannot tell a Black Friday surge apart from a stampede, because both look like “Redis is busy”.
Worked example
A homeware retailer on BigCommerce caches its category and product pages in Redis with a 60-second TTL. On 22 Apr 26 a popular product page’s cache entry expires at 13:00 just as an influencer post drives a crawl of bots and curious browsers to that exact URL. The platform team has the dual-axis card open.| Minute (BST) | Redis ops/sec | Orders/min | Pattern |
|---|---|---|---|
| 12:58 | 41,000 | 38 | tracking |
| 12:59 | 43,500 | 41 | tracking |
| 13:00 | 96,200 | 40 | ops spike, orders flat |
| 13:01 | 118,400 | 39 | ops spike, orders flat |
| 13:02 | 121,000 | 37 | ops spike, orders flat |
Sibling cards
| Card | Why pair it with this card | What the combination tells you |
|---|---|---|
| Operations per Second (live) | The raw throughput series without the order join. | Establishes the baseline ops level so a spike is unmistakable. |
| Keyspace Hit Rate % | The miss-rate view of a stampede. | A hit-rate dip co-occurring with an ops spike confirms a stampede over a bot crawl. |
| Command Latency p95 (ms) | The latency consequence of an ops spike. | Rising p95 during the spike means the herd is already hurting real requests. |
| SLOWLOG Entries (15m) | The slow-command count during the burst. | Slow commands plus ops spike points to expensive recompute or a large key. |
| Connected Clients Saturation vs Traffic Burst | The connection-side cross-channel join. | Together they show whether the spike is also exhausting the connection pool. |
| Redis Session Keys vs Active Ecom Users | The session-side cross-channel join. | Confirms whether the spike traffic is real logged-in users or anonymous bots. |
| Evicted Keys / minute | The memory-pressure view. | An ops spike that also drives evictions can knock out unrelated cached data. |
Reconciling against the source
Where to look natively:On Amazon ElastiCache: theredis-cli INFO statsforinstantaneous_ops_per_sec, plus the cumulativetotal_commands_processed,keyspace_hitsandkeyspace_missesthat explain a stampede.redis-cli --statfor a live rolling view of ops/sec, memory, and clients refreshed every second.redis-cli MONITOR(use briefly, it is costly) to see the actual commands flooding in and identify the hot key.redis-cli INFO commandstatsfor a per-command call count, useful for proving which command type dominated the spike.
GetTypeCmds, SetTypeCmds, and CacheHits / CacheMisses CloudWatch metrics let you reconstruct the ops mix; aggregate command rate is visible via the engine metrics.
Why our number may legitimately differ:
| Reason | Direction | Why |
|---|---|---|
| Estimate basis | Marginal | instantaneous_ops_per_sec is Redis’s own short-window estimate, not an exact count; brief sub-second spikes are smoothed. |
| Order-rate latency | Our orders lag | Order events from the storefront connector can arrive a few seconds behind the matching Redis ops, so the very leading edge of a divergence may look slightly offset. |
| Cluster aggregation | Variable | On Redis Cluster, ops are summed across shards while a stampede may concentrate on one shard; the per-shard view in native tooling can read higher than the aggregate. |
| Time-zone alignment | Marginal | Confirm the storefront connector and Redis sampling share a reporting time zone before treating a small horizontal offset as real divergence. |
Known limitations / FAQs
Ops and orders both spiked together. Should I worry? No, that is the healthy pattern and the alert will not fire. Operations rising in step with orders is exactly what a busy, well-behaved store looks like during a genuine surge. The card only flags divergence, where ops climb without a matching order rise. How is a cache stampede different from a bot crawl on this card? Both show ops up with orders flat, so this card flags both. To tell them apart, check siblings: a stampede usually drags Keyspace Hit Rate % down (lots of misses recomputing the same key), while a bot crawl can keep hit rate high but inflates Connected Clients and edge request counts. The fixes differ: TTL jitter and single-flight for stampedes, edge rate-limiting for bots. Why use instantaneous_ops_per_sec rather than counting commands myself?instantaneous_ops_per_sec is the value Redis itself maintains and the one operators recognise, so the card matches what you would see in redis-cli --stat. Computing your own rate from total_commands_processed deltas is possible but introduces sampling-window differences that make reconciliation harder.
We do not cache pages in Redis, only sessions and queues. Is this card still useful?
Yes. An ops spike with flat orders on a session/queue instance often means a runaway job, a retry storm, or a misbehaving consumer rather than a page stampede. The divergence signal is the same; the root cause is different. Pair it with Blocked Clients (BLPOP / BRPOP / WAIT) to spot queue-side trouble.
Can the order rate be zero legitimately while ops are normal?
Yes, overnight or in quiet hours orders can be near zero while background jobs keep Redis ticking over. The alert is tuned to fire on a spike in ops above the trailing baseline, not on a high ops-to-orders ratio in quiet periods, so steady low-order overnight traffic will not trip it.
Can I tune the spike sensitivity?
Yes. The divergence threshold is configurable per profile in the Sensitivity tab. Stores with naturally spiky cache traffic may widen it to reduce noise, while stores that have been bitten by stampedes may tighten it to catch the herd earlier.