At a glance
How many chunk migrations the balancer still has queued but has not yet completed on a sharded cluster. When the balancer decides a shard is carrying more than its fair share, it schedules chunk moves to even things out; each scheduled-but-not-finished move counts here. A small, transient number is healthy: it means the balancer is doing its job and the queue is draining. A large or stuck number is the warning sign: the balancer cannot keep up with the rate of imbalance, migrations are failing or being throttled, or the cluster is under enough write pressure that moves cannot complete cleanly. The card raises a sensitivity alert at >5 sustained: a backlog that size that does not clear means the cluster is drifting toward an uneven, hot-shard state faster than the balancer can correct it.
| What it tracks | The count of chunk migrations the balancer has queued or is actively running but has not yet committed, on a sharded cluster. Sharded clusters only; a single replica set has no balancer and reports nothing here. |
| Data source | Pending chunk migrations from the balancer, derived from the config server’s balancer activity and the config.migrations / active-migration state. Each entry is a chunk that has been selected to move from a source shard to a destination shard but whose move has not yet finished. |
| Time window | RT (real-time). The value reflects the live migration queue as the balancer reports it; it rises when new moves are scheduled and falls as each move commits. |
| Alert trigger | >5 sustained. More than five pending migrations that persist (rather than a momentary spike during a balancing round) raises a sensitivity alert: the balancer is overloaded and the backlog is not draining. |
| What counts | Chunks selected for migration that are queued or in flight: waiting for the source shard to be ready, copying documents to the destination, in the catch-up / commit phase, or blocked waiting for a lock. |
| What does NOT count | Completed migrations (they leave the queue), chunk splits (splitting is metadata-only and does not move data), and migrations that have been aborted (they are removed, not held pending). Jumbo chunks that the balancer refuses to move are never queued in the first place. |
| Roles | owner, platform, sre, dba |
Calculation
The value is a direct count of in-flight and queued chunk migrations as reported by the balancer:Worked example
A platform team runs a six-shard cluster supporting a session store and an event-ingest pipeline. They added two new shards on 12 Apr 26 to add capacity ahead of a campaign. Snapshot taken on 13 Apr 26 at 15:40 BST, roughly a day later.| Time | Pending migrations | Shard balance skew | Reading |
|---|---|---|---|
| 12 Apr 18:00 | 14 | 31% | Two empty shards just added; large queue is expected |
| 13 Apr 06:00 | 9 | 24% | Queue draining overnight, skew falling |
| 13 Apr 12:00 | 7 | 21% | Slowing during business-hours write load |
| 13 Apr 15:40 | 8 | 20% | Queue stuck above 5 and skew flat for three hours |
- Are migrations failing, or just slow? They check the balancer activity. The moves are starting but the catch-up phase is timing out because the campaign warm-up has pushed write volume high. Each move copies a chunk, then cannot catch up to the live write stream fast enough to commit, so it retries. The queue is not failing; it is being out-paced by writes.
- Is the balancing window too narrow? They confirm there is no restrictive balancing window: the balancer is allowed to run all day, so the window is not the bottleneck.
- Is write load the real constraint? Operations per Second (live) shows write ops up 3x on the source shards versus the prior week. The conclusion: the balancer is healthy but the cluster is too busy for migrations to commit cleanly during peak write hours.
Sibling cards
| Card | Why pair it with Chunks Pending Migration | What the combination tells you |
|---|---|---|
| Shard Balance Skew % | The static measure of how uneven the cluster is. | High skew with a high pending count equals “balancer catching up”. High skew with a zero pending count equals “balancer stuck on a structural problem (jumbo chunks or shard key)”. |
| Operations per Second (live) | The write pressure that gates migration commits. | A high pending count during a write spike means moves cannot catch up, not that the balancer is broken. |
| Replica Lag (seconds) | Replication health on the shards involved in moves. | Migrations add write load to the destination shard; if its secondaries fall behind, the move can stall in catch-up. |
| WiredTiger Cache Hit Rate % | Cache pressure during data copy. | Migrations churn the cache on both source and destination; a falling hit rate during a backlog shows the copy is evicting the working set. |
| Query Latency p95 (ms) | The user-facing cost of migrations in flight. | Active migrations add latency to the source and destination shards; a backlog can correlate with elevated tail latency. |
| MongoDB Health Score | The composite that factors sharding health. | A sustained migration backlog drags the composite down alongside skew. |
Reconciling against the source
Where to look in MongoDB’s own tooling:On MongoDB Atlas, the balancer state is visible in the cluster configuration and the per-shard metrics on the Metrics tab will show the data movement, though Atlas does not surface the raw pending count as a single figure. Why our number may legitimately differ fromsh.status()inmongoshagainst amongosrouter shows recent balancer activity and whether migrations are happening. The tail of the output lists active and recent moves.sh.isBalancerRunning()returns whether the balancer is currently executing a round, andsh.getBalancerState()returns whether it is enabled at all. A pending count that never moves while the balancer is disabled is expected, not a fault.db.getSiblingDB("config").migrations.find()against the config database lists the active migration documents directly, which is the closest raw equivalent to what this card counts.use config; db.changelog.find({what: /moveChunk/}).sort({time:-1})shows the history of completed and failed chunk moves, useful for confirming whether moves are committing or repeatedly aborting.
sh.status():
| Reason | Direction | Why |
|---|---|---|
| Sampling instant | Brief discrepancy | The queue changes as each move commits; a value read a second apart can differ by the number of moves that committed in between. |
| Active vs queued | Variable | Some tooling shows only the actively-running move; this card counts both running and queued moves selected in the current round. |
| Balancer disabled | Our value may sit non-zero | If the balancer was disabled mid-round, in-flight moves finish but no new ones start; a small residual count can persist until they commit. |
| Aborted moves | Our value lower | Moves that abort are removed from the count immediately; a tool reading the changelog may still show them as recent entries. |
Known limitations / FAQs
My cluster is a single replica set. Why is this card empty? Because there is no balancer. Chunk migrations only exist on sharded clusters with two or more shards; a single replica set holds all its data on one topology and never moves chunks. If you expect data here, confirm the connector points at amongos router and that sh.status() reports more than one shard.
The pending count spiked to 12 right after I added shards, then fell. Was that a problem?
No, that is the healthy pattern. New shards start empty, so the balancer schedules a burst of moves to fill them. The queue is at its largest immediately after the topology change and drains over the following hours. A spike that falls is the balancer working correctly. Only a spike that does not fall is a concern.
Pending is stuck above 5 but Shard Balance Skew % is not improving. What now?
This means moves are being scheduled but not committing. The three usual causes are: write pressure too high for the catch-up phase to complete (check Operations per Second (live)), a balancing window too narrow to finish the queue, or repeated aborts (check config.changelog for moveChunk.error entries). Diagnose which before acting; adding more shards will make a stuck queue worse, not better.
Pending is zero but the cluster is clearly unbalanced. Is the card broken?
No, this is the structural-imbalance signature. The balancer balances chunk count, and if counts are even but one shard’s chunks are larger or busier, the balancer considers its work done and queues nothing. The usual culprit is jumbo chunks (too large to move) or a shard key that concentrates data. The card is correct; the imbalance is one the balancer cannot fix without a schema change.
Do active migrations slow down my queries?
Yes, modestly. A migration copies documents off the source shard and applies them to the destination, which adds read load to the source and write load to the destination, and briefly takes a distributed lock at commit. Well-provisioned clusters absorb this with no visible effect; busy or under-provisioned clusters can see a small latency bump on the shards involved. If a large backlog coincides with elevated Query Latency p95 (ms), the migration load is a contributing factor.
Can I change the alert threshold of 5?
Yes. Five sustained pending migrations is the generic default. Sensitivity thresholds are configurable per profile in the Sensitivity tab. Large clusters that routinely run many parallel migrations during scheduled rebalancing windows may want a higher threshold so the alert reflects a genuine stall rather than normal balancing volume.