At a glance
Elections (24h) counts how many primary elections your replica set has held in the trailing 24 hours. An election happens when the set loses or replaces its primary and the remaining members vote on who takes over. One election after a planned maintenance is fine. Repeated, unplanned elections mean the primary is flapping, and a flapping primary is the single most disruptive thing a replica set can do, because every election causes a brief write outage while no primary exists. This is a MongoDB-distinctive signal: frequent elections almost always trace back to network instability or hardware trouble underneath the database.
| What it tracks | The number of primary elections held across the replica set in the trailing 24 hours. Frequent elections equal primary flapping equal network or hardware instability. |
| Data source | rs.status() member stateStr transitions and the election metrics in serverStatus().electionMetrics (notably numStepDownsCausedByHigherTerm, stepUpCmd, and the election term counter). Each term increment corresponds to an election. |
| Time window | 24h rolling. The headline is the count of elections observed across the window. |
| Alert trigger | > 1. More than one election in 24 hours escalates, because beyond a single planned event, repeated elections signal an unstable set. |
| Why it matters | Every election is a short write outage: for a few seconds to tens of seconds there is no primary, so writes are rejected or queued. Repeated elections multiply that outage and destabilise the application. |
| Reading the value | 0 is steady state. 1 is usually a planned maintenance, failover drill, or routine config change. 2 or more unplanned is flapping that needs root-cause investigation now. |
| Roles | owner, engineering, operations |
Calculation
A replica set always has exactly one primary (the member that accepts writes) and one or more secondaries. An election is the protocol by which the members choose a new primary. It is triggered when:- The current primary becomes unreachable (the secondaries stop receiving heartbeats within the configured timeout).
- The primary voluntarily steps down (a planned
rs.stepDown(), a config change, or a rolling upgrade). - A higher-priority member becomes available and forces a re-election.
- A network partition isolates the primary from a majority of voters.
stateStr transitions in rs.status(), corroborated by serverStatus().electionMetrics. The 24-hour headline is the number of distinct elections (term advances) observed across the window.
The alert fires above 1 because a single election in 24 hours is almost always benign: a planned maintenance, a deliberate step-down, or a brief blip that the set recovered from cleanly. Two or more, especially unplanned, is the signature of flapping. The cause is rarely MongoDB itself; it is usually one layer down. Network jitter or packet loss between members makes heartbeats time out intermittently. An overloaded or swapping primary cannot service heartbeats in time. A failing disk or a noisy-neighbour VM causes pauses long enough to look like a dead node. Each apparent death triggers an election, the “dead” node recovers, and the cycle repeats. That is why this card is framed as a stability and infrastructure signal, not just a replication statistic.
Worked example
A platform team runs a 3-node replica set (rs0) across two availability zones backing a checkout and session store. The Elections (24h) card normally reads 0. Snapshot taken on 18 Mar 26 at 14:50 GMT: it reads 5, well past the > 1 line, and the application team has been reporting intermittent “not master” write errors all afternoon.
The DBA pulls the election timeline from rs.status() and electionMetrics:
| Time (GMT) | Event | New primary |
|---|---|---|
| 12:10 | Election (term +1) | node-b |
| 12:38 | Election (term +1) | node-a |
| 13:05 | Election (term +1) | node-b |
| 13:41 | Election (term +1) | node-a |
| 14:22 | Election (term +1) | node-b |
RECOVERING.
Two takeaways:
- Elections are an infrastructure signal wearing a database costume. The count is reported by MongoDB, but two or more unplanned elections almost always mean network or hardware instability underneath. Do not start by debugging queries; start by checking the links and the hosts.
- One election is not a problem; a cadence is. A single election after a deploy or a node reboot is normal and self-heals. What this card is really watching for is a rhythm, the same set electing repeatedly, which is the fingerprint of flapping.
Sibling cards
| Card | Why pair it with Elections (24h) | What the combination tells you |
|---|---|---|
| Replica Set Members (state) | Shows which member holds which role right now. | Elections plus a member stuck in RECOVERING equals a node that cannot stably rejoin. |
| Replica Lag (seconds) | Lag often spikes around each election. | Elections plus high lag equals a secondary that keeps falling behind then triggering re-election. |
| Replica Set Member Lag >10s or in RECOVERING State | The real-time alert for an unhealthy member. | The alert flags the symptom; this card counts the resulting elections. |
| MongoDB Health Score | Elections drag the replication domain of the composite. | A health-score dip driven by elections equals a stability problem, not a load one. |
| Connection Errors (24h) | Each election causes a driver reconnection storm. | Elections plus connection-error spikes equals clients rediscovering the primary. |
| Operations per Second (live) | Write throughput dips during each no-primary window. | Periodic ops dips aligned with election times confirm the write-outage impact. |
| Instance Uptime | A short uptime explains a recent election. | A node that restarted recently legitimately caused one election. |
Reconciling against the source
Where to look in MongoDB’s own tooling:Why our number may legitimately differ from a manual read:rs.status()is the canonical view. Read themembersarray for currentstateStrper node and the top-levelterm; each term increment is one election. TheelectionDateandelectionIdon the primary show when the current term began.db.serverStatus().electionMetricsexposes election counters directly, including step-down causes, which helps distinguish a higher-priority takeover from a heartbeat-timeout failover.rs.printReplicationInfo()andrs.printSecondaryReplicationInfo()show the oplog window and per-secondary lag, useful for confirming whether a struggling secondary is provoking elections. The mongod log records each election with the reason (heartbeat timeout, step-down, priority takeover); grepping for election lines gives the exact timeline. Atlas users see failover events on the cluster’s Activity feed and Alerts page.
| Reason | Direction | Why |
|---|---|---|
| Term vs election count | Usually equal | The engine counts term advances; a rare protocol edge case can advance the term without a full primary change, which we reconcile against stateStr transitions. |
| Window boundary | Edge cases shift | An election right on the 24h boundary may sit inside or outside our window depending on poll timing. |
| Restart-induced elections | Both count them | A planned rolling restart legitimately produces elections; the card counts them the same as unplanned ones, so cross-check with your maintenance log. |
| Time zone | Timeline shifts | The mongod log and rs.status() render in node-local time; this card aligns the window to your reporting time zone. |
| Per-set scope | Card aggregates the set | rs.status() is run from one member’s view; the card reports the set-level election count. |
Known limitations / FAQs
The card shows 1 election but nothing seemed to break. Is that a problem? Usually not. A single election in 24 hours is the expected outcome of any planned event: a rolling upgrade, a deliberaters.stepDown(), a node reboot, or a priority change. The set elects a new primary cleanly in a few seconds and carries on. The alert is set at more than 1 precisely so a single planned election does not page anyone. If you cannot tie the one election to a known event, note it and watch for a pattern.
We had several elections but the application barely noticed. How?
Modern MongoDB drivers are retryable-writes aware: when an election briefly removes the primary, the driver waits, rediscovers the new primary, and retries the write transparently. If your application enables retryable writes and uses sensible timeouts, short elections can be largely invisible at the application layer. That is good engineering, but it does not make the elections harmless: the underlying instability is still there and will eventually produce an election long enough to break through. Treat the count as the real signal, not the application’s tolerance of it.
Why does the card frame this as a network or hardware problem rather than a database problem?
Because that is almost always where the cause lives. MongoDB holds an election when members stop hearing each other’s heartbeats. The most common reasons are network jitter or packet loss between members, an overloaded or swapping primary that cannot answer heartbeats in time, or a failing disk or noisy-neighbour VM causing pauses long enough to look like death. The database is reporting the symptom faithfully; the root cause is in the layer beneath it.
Can a single slow or lagging secondary cause elections?
Indirectly, yes. A secondary that keeps falling behind and entering RECOVERING can churn the set’s view of who is eligible to vote, and on some configurations a struggling member contributes to instability that culminates in re-elections. Pair this card with Replica Lag (seconds) and Replica Set Members (state); if one member is consistently the troublemaker, fix or replace that node.
Does raising the heartbeat timeout fix flapping?
It can mask it, which is sometimes the right immediate mitigation. Increasing the election heartbeat timeout makes the set more tolerant of transient network jitter, so it stops electing on brief blips. But it also slows down genuine failover when a node really does die, so it is a trade-off, not a cure. Use it to buy time while you fix the underlying network or hardware problem, then reconsider whether the higher timeout should stay.
On a sharded cluster, which elections does this count?
Each shard is its own replica set with its own elections, and the config servers form a replica set too. By default the card reflects the deployment the connector is scoped to. For a full sharded cluster, scope per shard to see which shard’s replica set is unstable, because one flapping shard can degrade the whole cluster while the others are perfectly healthy.
The count went up right after a planned maintenance. Should I worry?
No. Rolling maintenance (upgrades, restarts, config changes that step the primary down) legitimately produces elections, often one per member touched. Cross-reference the timeline with your change log and Instance Uptime. Elections that line up exactly with planned work are expected; the ones to investigate are the unplanned ones with no corresponding change.