Skip to main content
Card class: HeroCategory: Performance

At a glance

Command Latency p99 (ms) is the 99th-percentile server-side command execution time, in milliseconds: the experience of the slowest 1% of commands. On a single-threaded engine that slowest 1% matters far more than the fraction suggests, because every slow command stalls the thread and delays everything queued behind it. This is a Hero, sensitivity-tier card. A clean p95 next to a blown p99 is the signature of a small number of genuinely pathological events (an oversized key, a long Lua script, an RDB fork pause, or a swap stall) rather than a broad slowdown.
What it tracksThe 99th-percentile server-side command execution latency, in milliseconds. The slowest 1 command in every 100 takes at least this long.
Data sourceLATENCY HISTORY (Redis 7+ per-event latency monitoring) where enabled, otherwise client-side sampling of command round-trips with the network baseline subtracted. Engine-side time only.
Time windowRT/5m (real-time, computed over a trailing 5-minute window).
Alert trigger> 50ms. The p99 threshold is deliberately higher than p95 (10ms): the deep tail tolerates more, but 50ms is where the slowest 1% starts visibly hurting application response times.
Why it mattersTail latency on a shared, single-threaded resource is head-of-line blocking. A 50ms p99 command does not just affect that one request; it delays the next commands in the queue too.
What does NOT countNetwork round-trip, connection setup, and client-side work. This is the time spent inside the Redis server.
Rolesengineering, operations

Calculation

The card computes the 99th percentile of per-command execution timings across the trailing 5-minute window and converts to milliseconds. Where Redis 7+ latency monitoring is active, timings are sourced from the server’s LATENCY HISTORY event log; otherwise Vortex IQ samples command round-trips client-side and subtracts the measured network baseline to estimate the server-side figure. The 5-minute window matters more for p99 than for p50: the deep tail is where rare events live, and a window short enough to react but long enough to be statistically meaningful keeps a single spike from either dominating or vanishing. Because Redis runs commands on one thread, a p99 of 50ms means that for that slowest 1% the entire instance was unavailable for up to 50ms at a stretch, which is why the deep tail is treated as a Hero signal rather than a curiosity.

Worked example

An SRE team runs a Redis 7.0 primary with RDB persistence enabled for a cart and session store. Snapshot taken on 22 May 26 at 02:05 BST, during an overnight batch import.
ReadingValueStatus
Command Latency p5070usHealthy
Command Latency p952.1msWithin threshold (10ms)
Command Latency p9988msBreached (threshold 50ms)
Last RDB Save (minutes ago)3Recent save
Memory Used vs Maxmemory %64%Comfortable
The shape is unusual: p50 and even p95 look fine, but p99 has blown out to 88ms. That p95-to-p99 cliff (2.1ms to 88ms) is the giveaway. It is not a hot key dragging a slice of traffic; it is a small number of multi-tens-of-millisecond stalls. The recent RDB save (3 minutes ago) is the clue. On RDB persistence, Redis forks a child process to write the snapshot, and on a large dataset that fork() triggers a copy-on-write page-table copy that can pause the main thread for tens of milliseconds. The batch import inflated the dataset and the write rate, so the save took longer and the fork pause was bigger.
Diagnosis trail for the 22 May p99 breach:
  - p50 70us, p95 2.1ms -> everyday traffic is healthy
  - p99 88ms -> a few commands stalled hard
  - p95->p99 cliff (2.1ms -> 88ms) -> rare, large stalls, not a hot key
  - RDB save 3 min ago + heavy import -> fork() / copy-on-write pause is the suspect

Fix applied:
  - Move the heavy import to a replica or off-peak window
  - Tune save points so snapshots do not coincide with write bursts
  - Confirm THP (transparent huge pages) is disabled (it worsens fork latency)
  Result the next night (23 May): p99 back to 9ms during the import
The lesson: a p99-only breach with healthy p95 almost never means “the instance is overloaded”. It means something rare and heavy happened a handful of times. The usual culprits are persistence forks, a swap stall, or a single very large key being touched. Find the rare event, do not scale the node blindly.

Sibling cards

CardWhy pair it with Command Latency p99What the combination tells you
Command Latency p95 (ms)The shallower tail.A big p95-to-p99 cliff means rare, heavy stalls; both high means a broad slowdown.
Command Latency p50 (us)The median baseline.Healthy p50 with blown p99 confirms the problem is the deep tail only.
SLOWLOG Entries (15m)Counts commands over the slowlog threshold.Confirms whether the tail is named commands or invisible engine pauses (forks, swap).
Top 10 SLOWLOG CommandsNames the offenders.If SLOWLOG is empty but p99 is high, the stall is engine-level (fork/swap), not a command.
Last RDB Save (minutes ago)Persistence-fork timing.A p99 spike coinciding with a recent save points straight at fork latency.
Memory Fragmentation RatioDetects swap (ratio < 1).A p99 breach with fragmentation below 1 means swap stalls, fixable only with more RAM.
Redis Health ScoreThe composite.Deep-tail breaches drag the composite; this card explains the cause.

Reconciling against the source

Where to look in Redis’s own tooling:
LATENCY HISTORY command, LATENCY LATEST, and LATENCY DOCTOR (Redis 7+). LATENCY DOCTOR is especially useful for p99 work: it names probable causes (fork, expire cycle, AOF rewrite) for the spikes it has recorded. SLOWLOG GET 25 for named slow commands. If SLOWLOG is empty while p99 is high, the stall is an engine-level pause (fork, swap, expire cycle), not a command, which is the key fork in the diagnosis. INFO persistence for rdb_last_fork_usec and latest_fork_usec, which report exactly how long the last fork blocked the main thread (in microseconds), and INFO stats for latest_fork_usec. redis-cli --intrinsic-latency 60 to measure host scheduling jitter and rule out a CPU-starved or noisy-neighbour VM.
On a managed service, cross-check the engine-side latency metrics in CloudWatch (SuccessfulReadRequestLatency, SuccessfulWriteRequestLatency) for ElastiCache and MemoryDB, and on ElastiCache watch EngineCPUUtilisation around the spike, since fork and snapshot work shows up there. Why our number may legitimately differ:
ReasonDirectionWhy
Sampling vs native monitorEitherWithout the Redis 7+ latency monitor, the deep tail is harder to capture by sampling; rare stalls may be under- or over-counted depending on sample timing.
Window lengthEitherA 5-minute percentile smooths a single fork spike that a momentary LATENCY LATEST check would show at full height.
Per-node vs aggregateEitherCluster per-shard reporting versus a console-wide aggregate. A single bad shard can dominate one and be averaged away in the other.
Network includedCLI higherredis-cli --latency includes the round-trip; this card is server-side only.

Known limitations / FAQs

My p99 spikes but SLOWLOG is empty. What does that mean? That is the classic engine-level-pause signature. SLOWLOG only records command execution time, so it misses stalls that happen between commands: persistence forks, the expire cycle, AOF rewrites, or swap. Check Last RDB Save (minutes ago) and INFO persistence for latest_fork_usec, and check Memory Fragmentation Ratio for swap. An empty SLOWLOG with a high p99 almost always points one of these ways. Why is the p99 threshold (50ms) higher than the p95 threshold (10ms)? Because the deep tail naturally tolerates more. Even a healthy instance will occasionally see a fork pause or a one-off large command, and you do not want to page on every rare event. The 50ms line is set where the slowest 1% becomes large enough to be felt in application response times, rather than just statistically present. Should I worry about p99 if p50 and p95 are healthy? It depends what is driving it. If the cause is a periodic fork pause that you can move off-peak, it is a tuning task, not an emergency. If the cause is a single very large key that any request can hit, it is a latent landmine: today only 1% hit it, but a traffic shift could make it 10%. Use Top 10 SLOWLOG Commands to tell which. How does persistence cause p99 spikes specifically? Both RDB snapshots and AOF rewrites fork a child process. The fork itself copies the parent’s page tables, and on a large dataset that copy can block the main thread for tens of milliseconds. After the fork, copy-on-write means write-heavy workloads keep duplicating pages, adding memory pressure. The fix is to schedule saves away from write bursts, keep the dataset sized so forks are cheap, and disable transparent huge pages, which dramatically worsen fork latency. Can a single large key really move the p99 on its own? Yes. If one HGETALL against a 500,000-field hash takes 80ms and it is called even a few times a minute, those calls land in the slowest 1% and set the p99. The instance can be otherwise idle. This is why p99 is a Hero card: it surfaces individually rare but individually severe events that averages hide completely. Does AOF (append-only file) persistence help or hurt p99? AOF with appendfsync everysec is gentle on the common path, but AOF rewrites still fork and can cause the same tail spikes as RDB. appendfsync always is the dangerous setting: it fsyncs on every write, which can add disk latency to the deep tail directly. Watch Last AOF Rewrite Status and correlate rewrite windows with p99 spikes.

Tracked live in Vortex IQ Nerve Centre

Command Latency p99 (ms) is one of hundreds of KPI pulses Vortex IQ tracks across Redis and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.