Chunks Pending Migration, MongoDB - Vortex IQ Help Centre

Card class: Sensitivity • Category: Replication & Sharding

At a glance

How many chunk migrations the balancer still has queued but has not yet completed on a sharded cluster. When the balancer decides a shard is carrying more than its fair share, it schedules chunk moves to even things out; each scheduled-but-not-finished move counts here. A small, transient number is healthy: it means the balancer is doing its job and the queue is draining. A large or stuck number is the warning sign: the balancer cannot keep up with the rate of imbalance, migrations are failing or being throttled, or the cluster is under enough write pressure that moves cannot complete cleanly. The card raises a sensitivity alert at >5 sustained: a backlog that size that does not clear means the cluster is drifting toward an uneven, hot-shard state faster than the balancer can correct it.


What it tracks	The count of chunk migrations the balancer has queued or is actively running but has not yet committed, on a sharded cluster. Sharded clusters only; a single replica set has no balancer and reports nothing here.
Data source	Pending chunk migrations from the balancer, derived from the config server’s balancer activity and the `config.migrations` / active-migration state. Each entry is a chunk that has been selected to move from a source shard to a destination shard but whose move has not yet finished.
Time window	`RT` (real-time). The value reflects the live migration queue as the balancer reports it; it rises when new moves are scheduled and falls as each move commits.
Alert trigger	`>5 sustained`. More than five pending migrations that persist (rather than a momentary spike during a balancing round) raises a sensitivity alert: the balancer is overloaded and the backlog is not draining.
What counts	Chunks selected for migration that are queued or in flight: waiting for the source shard to be ready, copying documents to the destination, in the catch-up / commit phase, or blocked waiting for a lock.
What does NOT count	Completed migrations (they leave the queue), chunk splits (splitting is metadata-only and does not move data), and migrations that have been aborted (they are removed, not held pending). Jumbo chunks that the balancer refuses to move are never queued in the first place.
Roles	owner, platform, sre, dba

Calculation

The value is a direct count of in-flight and queued chunk migrations as reported by the balancer:

chunks_pending_migration = count( migrations where state != committed and state != aborted )

The balancer runs in rounds. In each round it inspects the chunk distribution across shards, and where the gap between the fullest and emptiest shard exceeds its migration threshold, it schedules one or more chunk moves. By default the balancer can run several migrations in parallel (one per shard pair), so the queue at any instant is the set of moves selected this round that have not yet committed. A single chunk migration is not instantaneous: it copies every document in the chunk from the source shard to the destination, runs a catch-up phase to capture writes that arrived during the copy, then commits the metadata change on the config servers. Under light load each move completes in seconds; under heavy write load the catch-up phase can stall, and the move sits in the queue longer. That is why a sustained high count is meaningful: it means moves are starting but not finishing. This card is the dynamic companion to Shard Balance Skew %. Skew tells you how uneven the cluster is right now; pending migrations tell you whether the balancer is winning or losing the race to fix it.

Worked example

A platform team runs a six-shard cluster supporting a session store and an event-ingest pipeline. They added two new shards on 12 Apr 26 to add capacity ahead of a campaign. Snapshot taken on 13 Apr 26 at 15:40 BST, roughly a day later.

Time	Pending migrations	Shard balance skew	Reading
12 Apr 18:00	14	31%	Two empty shards just added; large queue is expected
13 Apr 06:00	9	24%	Queue draining overnight, skew falling
13 Apr 12:00	7	21%	Slowing during business-hours write load
13 Apr 15:40	8	20%	Queue stuck above 5 and skew flat for three hours

The gauge reads 8 pending and has held above 5 for several hours, so the sensitivity alert is active. The team works through the likely causes in order:

Are migrations failing, or just slow? They check the balancer activity. The moves are starting but the catch-up phase is timing out because the campaign warm-up has pushed write volume high. Each move copies a chunk, then cannot catch up to the live write stream fast enough to commit, so it retries. The queue is not failing; it is being out-paced by writes.
Is the balancing window too narrow? They confirm there is no restrictive balancing window: the balancer is allowed to run all day, so the window is not the bottleneck.
Is write load the real constraint? Operations per Second (live) shows write ops up 3x on the source shards versus the prior week. The conclusion: the balancer is healthy but the cluster is too busy for migrations to commit cleanly during peak write hours.

Decision the team takes:
  - Pending = 8, skew = 20%, both flat for 3h during a write spike.
  - Verdict: balancer is fine, write pressure is the bottleneck.
  - Action 1: let it ride; queue will drain as the warm-up traffic subsides.
  - Action 2: if the campaign sustains the load, schedule a balancing
              window for the quieter overnight hours so migrations commit
              against a lighter write stream.
  - Do NOT: keep adding shards. That widens the gap and lengthens the queue.

The key lesson: a stuck pending count is rarely a balancer bug. It is almost always one of three things, namely the cluster being too busy for moves to commit, a balancing window that is too narrow, or jumbo chunks the balancer cannot move (which show as persistent skew with an empty queue, the opposite signature). Reading this card alongside skew and operation rate tells you which.

Sibling cards

Card	Why pair it with Chunks Pending Migration	What the combination tells you
Shard Balance Skew %	The static measure of how uneven the cluster is.	High skew with a high pending count equals “balancer catching up”. High skew with a zero pending count equals “balancer stuck on a structural problem (jumbo chunks or shard key)”.
Operations per Second (live)	The write pressure that gates migration commits.	A high pending count during a write spike means moves cannot catch up, not that the balancer is broken.
Replica Lag (seconds)	Replication health on the shards involved in moves.	Migrations add write load to the destination shard; if its secondaries fall behind, the move can stall in catch-up.
WiredTiger Cache Hit Rate %	Cache pressure during data copy.	Migrations churn the cache on both source and destination; a falling hit rate during a backlog shows the copy is evicting the working set.
Query Latency p95 (ms)	The user-facing cost of migrations in flight.	Active migrations add latency to the source and destination shards; a backlog can correlate with elevated tail latency.
MongoDB Health Score	The composite that factors sharding health.	A sustained migration backlog drags the composite down alongside skew.

Reconciling against the source

Where to look in MongoDB’s own tooling:

sh.status() in mongosh against a mongos router shows recent balancer activity and whether migrations are happening. The tail of the output lists active and recent moves. sh.isBalancerRunning() returns whether the balancer is currently executing a round, and sh.getBalancerState() returns whether it is enabled at all. A pending count that never moves while the balancer is disabled is expected, not a fault. db.getSiblingDB("config").migrations.find() against the config database lists the active migration documents directly, which is the closest raw equivalent to what this card counts. use config; db.changelog.find({what: /moveChunk/}).sort({time:-1}) shows the history of completed and failed chunk moves, useful for confirming whether moves are committing or repeatedly aborting.

On MongoDB Atlas, the balancer state is visible in the cluster configuration and the per-shard metrics on the Metrics tab will show the data movement, though Atlas does not surface the raw pending count as a single figure. Why our number may legitimately differ from sh.status():

Reason	Direction	Why
Sampling instant	Brief discrepancy	The queue changes as each move commits; a value read a second apart can differ by the number of moves that committed in between.
Active vs queued	Variable	Some tooling shows only the actively-running move; this card counts both running and queued moves selected in the current round.
Balancer disabled	Our value may sit non-zero	If the balancer was disabled mid-round, in-flight moves finish but no new ones start; a small residual count can persist until they commit.
Aborted moves	Our value lower	Moves that abort are removed from the count immediately; a tool reading the changelog may still show them as recent entries.

Cross-connector reconciliation: a migration backlog that coincides with an ecommerce traffic spike is the classic campaign pattern. Pair with MongoDB OPS Spike vs Ecom Order Rate to confirm the write pressure gating the migrations is driven by genuine front-end demand rather than a runaway batch job.

Known limitations / FAQs

My cluster is a single replica set. Why is this card empty? Because there is no balancer. Chunk migrations only exist on sharded clusters with two or more shards; a single replica set holds all its data on one topology and never moves chunks. If you expect data here, confirm the connector points at a mongos router and that sh.status() reports more than one shard. The pending count spiked to 12 right after I added shards, then fell. Was that a problem? No, that is the healthy pattern. New shards start empty, so the balancer schedules a burst of moves to fill them. The queue is at its largest immediately after the topology change and drains over the following hours. A spike that falls is the balancer working correctly. Only a spike that does not fall is a concern. Pending is stuck above 5 but Shard Balance Skew % is not improving. What now? This means moves are being scheduled but not committing. The three usual causes are: write pressure too high for the catch-up phase to complete (check Operations per Second (live)), a balancing window too narrow to finish the queue, or repeated aborts (check config.changelog for moveChunk.error entries). Diagnose which before acting; adding more shards will make a stuck queue worse, not better. Pending is zero but the cluster is clearly unbalanced. Is the card broken? No, this is the structural-imbalance signature. The balancer balances chunk count, and if counts are even but one shard’s chunks are larger or busier, the balancer considers its work done and queues nothing. The usual culprit is jumbo chunks (too large to move) or a shard key that concentrates data. The card is correct; the imbalance is one the balancer cannot fix without a schema change. Do active migrations slow down my queries? Yes, modestly. A migration copies documents off the source shard and applies them to the destination, which adds read load to the source and write load to the destination, and briefly takes a distributed lock at commit. Well-provisioned clusters absorb this with no visible effect; busy or under-provisioned clusters can see a small latency bump on the shards involved. If a large backlog coincides with elevated Query Latency p95 (ms), the migration load is a contributing factor. Can I change the alert threshold of 5? Yes. Five sustained pending migrations is the generic default. Sensitivity thresholds are configurable per profile in the Sensitivity tab. Large clusters that routinely run many parallel migrations during scheduled rebalancing windows may want a higher threshold so the alert reflects a genuine stall rather than normal balancing volume.

Tracked live in Vortex IQ Nerve Centre

Chunks Pending Migration is one of hundreds of KPI pulses Vortex IQ tracks across MongoDB and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.

​At a glance

​Calculation

​Worked example

​Sibling cards

​Reconciling against the source

​Known limitations / FAQs

​Tracked live in Vortex IQ Nerve Centre