At a glance
The fraction of WiredTiger’s cache that holds modified (“dirty”) pages waiting to be written to disk by a checkpoint or eviction. Every write in MongoDB first lands in the in-memory cache and is marked dirty; the storage engine then flushes those pages to disk in the background. This gauge shows how much of the cache is currently dirty. A small dirty fraction is healthy and normal; a large one means writes are arriving faster than WiredTiger can flush them. The card turns red at >20% because once dirty content climbs past the engine’s eviction trigger, MongoDB starts forcing application threads to help evict, which stalls writes and spikes latency.
| What it tracks | The proportion of the WiredTiger cache occupied by dirty (modified, not yet written to disk) pages, shown as a live gauge from 0% to 100%. |
| Data source | Derived from the wiredTiger.cache sub-document of serverStatus: cache."tracked dirty bytes in the cache" / cache."maximum bytes configured". This is the MongoDB-distinctive write-pressure surface inside the storage engine. |
| Time window | RT (real-time gauge). The value reflects the live dirty fraction at each poll; sustained elevation, not a momentary blip, is the signal that matters. |
| Alert trigger | >20%. A dirty fraction above 20% raises a sensitivity alert because it is the point at which WiredTiger’s eviction machinery begins working hard and write throttling becomes likely. |
| What counts | Modified collection and index pages held in the WiredTiger cache that have not yet been persisted to disk by a checkpoint or eviction. |
| What does NOT count | Clean (unmodified) cached pages, which are tracked by the read-side hit-rate metric instead. The figure is per-mongod; a secondary applying its own writes has its own dirty fraction. |
| Roles | owner, platform, sre, dba |
Calculation
The gauge is a ratio of two fields in thewiredTiger.cache sub-document of serverStatus:
tracked dirty bytes in the cacheis the volume of modified pages currently held in the cache that still need to be flushed to disk. Writes increase it; checkpoints and eviction decrease it.maximum bytes configuredis the configured WiredTiger cache size (by default roughly 50% of RAM minus 1 GB, or 256 MB, whichever is larger). Using the configured maximum as the denominator, rather than the currently used bytes, gives a stable ceiling so the gauge is comparable over time.
- WiredTiger has its own internal dirty thresholds. By default the engine begins background eviction of dirty pages at roughly 5% dirty and starts forcing application threads to participate in eviction at roughly 20% dirty. The card’s
>20%alert is aligned with that second threshold, the point where writes start paying an eviction tax. - A high dirty fraction is a write-flush problem, not a write-volume problem in isolation. It rises when incoming write rate outpaces the engine’s ability to flush, which can be caused by slow disk, an oversized write burst, an infrequent checkpoint cadence, or contention from background eviction.
- Per-member. On a replica set the primary carries the write load and usually shows the highest dirty fraction; secondaries dirty their cache as they apply the oplog.
Worked example
A platform team runs a MongoDB 6.0 primary with a 15 GB WiredTiger cache, backing an orders and events-ingest workload. A bulk event-replay job is launched to backfill a new collection. Readings taken on 12 Jun 26.| Time (UTC) | dirty bytes | dirty % | Eviction behaviour | State |
|---|---|---|---|---|
| 10:00 | 0.45 GB | 3.0% | Background eviction idle | Normal write load |
| 11:20 | 1.35 GB | 9.0% | Background eviction active | Replay ramping |
| 11:45 | 3.00 GB | 20.0% | Forced app-thread eviction begins | Red, alert fires |
| 12:05 | 4.05 GB | 27.0% | App threads stalling on eviction | Write latency spiking |
- Immediate: throttle or pause the bulk replay so the incoming write rate drops below the flush rate; the dirty fraction drains as checkpoints catch up, usually within a checkpoint interval or two (default checkpoints run roughly every 60 seconds).
- Structural: if dirty pressure recurs under normal load, the flush path is the bottleneck. Move to faster disk (dirty cache is acutely sensitive to write IOPS and latency), increase cache size so there is more room to absorb bursts, or reshape the workload to spread writes rather than batching them into spikes.
- Dirty cache is the write-side mirror of cache hit rate. WiredTiger Cache Hit Rate % tells you whether reads fit in memory; this card tells you whether writes can be flushed fast enough. Read them together to understand total cache pressure.
- The 20% line maps to a real engine behaviour, not an arbitrary number. Below it, eviction is a quiet background activity; above it, your application’s own threads are forced to evict, which is exactly when users feel write latency. That is why it is a “fix it now” line.
- Slow disk is the usual root cause. A dirty fraction that climbs under ordinary write load almost always points to the storage layer not keeping up. Check disk write latency and IOPS before assuming the workload is at fault.
Sibling cards to read alongside
| Card | Why pair it with WiredTiger Dirty Cache | What the combination tells you |
|---|---|---|
| WiredTiger Cache Hit Rate % | The read-side companion to this write-side gauge. | Low hit rate plus high dirty fraction means the cache is squeezed from both directions and needs more headroom. |
| Query Latency p95 (ms) | The user-facing symptom of forced eviction. | A dirty fraction past 20% rising in step with p95 confirms writes are stalling on eviction. |
| Operations per Second (live) | Separates a write burst from a flush bottleneck. | High dirty with high write ops is a flush-rate problem; high dirty with modest ops points to slow disk. |
| Slow Ops (15m, >100ms) | Catches the writes that turn slow when eviction stalls them. | A jump in slow ops coinciding with a dirty-cache breach pinpoints the affected operations. |
| Memory Resident (MB) | The RAM available to the cache. | A capped resident set alongside high dirty means the cache cannot grow to buffer write bursts. |
| MongoDB Health Score | The composite that weights cache and write health. | A sustained dirty-cache breach should pull the health score down. |
Reconciling against the source
Where to confirm the number in MongoDB’s own tooling:Why our number may legitimately differ from the native view:mongosh:db.serverStatus().wiredTiger.cachereturns"tracked dirty bytes in the cache","maximum bytes configured","bytes currently in the cache", and the eviction counters. Divide tracked dirty bytes by the configured maximum to reproduce this gauge.mongostat: thedirtycolumn shows the dirty-cache percentage directly, refreshed each interval, and theusedcolumn shows total cache usage; these are the quickest live confirmation. Atlas: the Metrics tab has a Cache Activity chart, and the Cache Dirty Bytes / cache-fill series track the same pressure this card reports.db.serverStatus().wiredTiger: the broader document exposesevictioncounters (pages evicted by application threads vs background workers) that explain why a high dirty fraction is hurting latency.
| Reason | Direction | Why |
|---|---|---|
| Denominator choice | Either | Vortex IQ divides by maximum bytes configured for a stable ceiling; mongostat’s dirty column divides by currently used bytes, so the two read slightly differently when the cache is not full. |
| Member polled | Either | The gauge reads one mongod (normally the primary). A native tool pointed at a secondary shows that member’s separate dirty fraction. |
| Sampling instant | Either | Dirty content swings between checkpoints; a sample taken just before a checkpoint reads higher than one just after, so two tools sampling at different instants can disagree. |
| Cache size changes | Either | If the configured cache size changed recently, the denominator shifts; confirm "maximum bytes configured" matches expectations. |
| Time zone | Axis only | Chart axes use your profile zone; the ratio itself is zone-independent. |
| Card | Expected relationship | What causes divergence |
|---|---|---|
| MongoDB OPS Spike vs Ecom Order Rate | A genuine write-heavy traffic spike can raise the dirty fraction. | A dirty-cache climb with no matching ecom write activity points to a background job (import, replay, migration) flooding writes rather than real demand. |
| MongoDB Health Score | A sustained dirty-cache breach should pull the health score down. | If the score stays green during a breach, check its write-health weighting in the sensitivity profile. |