Galera Cluster Status, MariaDB - Vortex IQ Help Centre

Card class: Hero • Category: Galera Cluster

At a glance

The state of the Galera cluster as the polled node sees it, read from the wsrep_cluster_status status variable. The healthy value is Primary, meaning the node is part of a quorum-holding majority and is allowed to accept writes. Any other value (Non-Primary or Disconnected) means the node has lost contact with a majority of the cluster and will refuse writes to protect data consistency. For a DBA this is the binary “can my application still write to the database?” verdict, and it is one of the most distinctive signals MariaDB Galera offers over standalone MySQL.


Status variable	`wsrep_cluster_status` from `SHOW GLOBAL STATUS LIKE 'wsrep_cluster_status'`. A string: `Primary`, `Non-Primary`, or `Disconnected`.
Metric basis	Galera Primary-Component membership verdict, NOT a connection test or ping. It reflects whether the node believes it is in the quorum-holding partition.
Aggregation window	Real-time, polled on the Nerve Centre refresh cycle. The value is instantaneous.
Healthy value	`Primary`. The node is in the majority partition and writes are accepted.
What it means	`Primary` = healthy quorum; `Non-Primary` = split-brain risk, node refuses writes; `Disconnected` = node cannot reach the group at all.
What does NOT change it	(1) High query latency; (2) a full disk (that is a separate failure mode); (3) async-replica lag; (4) router health. The variable is purely about Galera group membership.
Time window	`RT` (real-time, polled each refresh cycle)
Alert trigger	`!= Primary`, any value other than `Primary` is an immediate write-availability incident.
Roles	owner, engineering, operations

Calculation

The card runs SHOW GLOBAL STATUS LIKE 'wsrep_cluster_status' against the connected node and surfaces the string verbatim. There is no derivation; Galera sets this value itself as the group-communication layer evaluates quorum. The three possible values map to states as follows:

Primary       => healthy. Node is in the majority partition; writes accepted.
Non-Primary   => critical. Node sees the group but is NOT in the majority;
                 it goes read-only and rejects writes to avoid split-brain.
Disconnected  => critical. Node cannot reach the Galera group at all
                 (network fault, all peers down, or it just started).

The alert fires on anything that is not Primary. This is deliberately strict: a Non-Primary node is not a degraded-but-usable state, it is a node that has stopped serving writes entirely. The card therefore reads as a clean binary for dashboards: green when Primary, red otherwise. It pairs naturally with Galera Cluster Size, which explains why a node has gone Non-Primary (membership fell below the quorum floor).

Worked example

A platform team runs a 3-node MariaDB Galera cluster split across two availability zones: db-galera-01 and db-galera-02 in zone A, db-galera-03 in zone B. On 22 May 26 at 14:05 BST a network partition severs zone A from zone B.

Node	Zone	Peers it can see	`wsrep_cluster_status`	Writes?
db-galera-01	A	db-galera-02 (2 of 3)	Primary	Yes
db-galera-02	A	db-galera-01 (2 of 3)	Primary	Yes
db-galera-03	B	none (1 of 3)	Non-Primary	No

The Vortex IQ headline for the zone-B connector turns red: Non-Primary, while the zone-A nodes stay green. The DBA reads three things:

Galera is doing exactly the right thing. The 2-node majority in zone A retained quorum and stays writable. The lone node in zone B correctly went Non-Primary rather than accept conflicting writes. This is split-brain prevention working as designed, not a database bug.
The application must route to the majority. If the load balancer or MaxScale is still sending writes to db-galera-03, those writes are now being rejected with WSREP has not yet prepared node for application use. The fix is routing, not the database: point writes at the zone-A Primary partition.
There is no data loss, only a stalled minority. When the network heals, db-galera-03 rejoins the Primary Component, performs an IST to catch the writes it missed, and returns to Primary. The team should not force-bootstrap zone B as its own Primary, doing so would create a genuine split-brain with two divergent datasets.

Decision during the partition:
  - Majority partition (zone A): writable, keep serving traffic here.
  - Minority node (zone B): read-only, drain it from the write path.
  - DO NOT pc.bootstrap zone B (would create divergent histories).
  - On network recovery: zone B IST-rejoins automatically; status returns to Primary.

When the partition clears at 14:31, db-galera-03 reports Primary again, the card returns to green, and writes resume cluster-wide. The lesson the team should carry: a Non-Primary reading is a routing emergency, not a repair emergency; never force a minority node back to Primary to “fix” the card.

Sibling cards to reference together

Card	Why pair it with Galera Cluster Status	What the combination tells you
Galera Cluster Size	Explains why a node went Non-Primary.	Size below quorum floor is the usual cause of a Non-Primary status.
Galera Cluster Not in Primary State or Node Lost	The alert-list card that fires on this exact condition.	A Non-Primary reading should always appear as a row in this feed.
Galera Flow Control Paused %	Pre-cursor signal: a struggling node before it drops out.	Sustained flow control can precede a node leaving and a status flip.
Failover Readiness	Whether a standby can take over writes.	Non-Primary plus no healthy standby equals a hard write outage.
MariaDB Health Score	The composite that takes cluster state as a major input.	A Non-Primary node sharply drops the composite.
Connection Errors (24h)	Apps hitting a Non-Primary node log write rejections.	A status flip often co-occurs with a spike in connection/write errors.
Async Replication Lag (seconds)	Downstream async replicas read from the cluster.	A Non-Primary source can stall async replicas feeding from it.

Reconciling against the source

Where to look in MariaDB’s own tooling:

Run SHOW GLOBAL STATUS LIKE 'wsrep_cluster_status'; on each node, this is the exact variable the card reads. Run SHOW GLOBAL STATUS LIKE 'wsrep_ready'; (ON/OFF) and LIKE 'wsrep_connected'; for the companion readiness flags. Check wsrep_local_state_comment for the human-readable node state (Synced, Donor/Desynced, Joining, Initialized). On a managed service, the provider console (for example SkySQL or your cloud MariaDB cluster view) shows the same Primary/Non-Primary topology.

Why our reading may legitimately differ between nodes:

Reason	Direction	Why
Which node you query	Can differ during a partition	Each node reports its own view. During a split, majority nodes read `Primary` while the minority reads `Non-Primary`. Vortex IQ polls the configured endpoint.
Poll timing	Brief lag	A status flip between polls is not reflected until the next refresh cycle.
Just-started node	Transient Disconnected/Joining	A node booting reads `Disconnected` then `Joining` before reaching `Primary`; this is normal startup, not a fault.
Router masking	None to value	MaxScale may stop routing to a Non-Primary node, but the backend node still reports its true status.

Cross-source reconciliation:

Source	Expected relationship	What causes divergence
`wsrep_ready`	Should be `ON` whenever status is `Primary`.	If status is `Primary` but `wsrep_ready` is `OFF`, the node is in a transitional state (for example desynced as a donor) and is not serving normally.
Provider console topology	Should agree on which partition is Primary.	A console may lag the live group-comms decision by a few seconds during a fast partition.

Known limitations / FAQs

My node says Non-Primary but the database process is running fine. Is it broken? The process is healthy; the node has simply lost quorum and is refusing writes on purpose to prevent split-brain. This is Galera protecting your data, not a crash. The fix is to restore connectivity so the node rejoins a majority, or to route your application at the majority partition. Never confuse “process up” with “writable”. Reads still work on a Non-Primary node, why? By default a Non-Primary node rejects both reads and writes (it returns WSREP has not yet prepared node). If reads appear to work, you likely have wsrep_dirty_reads=ON set, which permits stale reads from a node that has fallen out of the cluster. That is acceptable for some reporting use cases but dangerous for anything that then writes back; understand the trade-off before relying on it. Can I force a Non-Primary node back to Primary? You can, with SET GLOBAL wsrep_provider_options='pc.bootstrap=YES', but you almost never should. Forcing a minority node to bootstrap creates a second independent Primary with a divergent write history, which is the genuine split-brain catastrophe Galera was preventing. Only bootstrap deliberately when you have confirmed all other nodes are truly dead and you are intentionally recovering from the most-advanced survivor. What is the difference between Non-Primary and Disconnected? Non-Primary means the node can talk to some peers but they do not form a majority. Disconnected means the node cannot reach the Galera group at all (every peer is unreachable, or the node has just started and not yet connected). Both are write-unavailable; Disconnected usually points at a network/firewall problem on the Galera ports, while Non-Primary points at a quorum split. How fast does the card detect a flip? As fast as the poll cycle. Galera itself decides quorum within its group-communication timeout (sub-second to a few seconds), and the card surfaces the new value on the next Nerve Centre refresh. For the strictest real-time signal, pair this card with the alert-list card, which is designed to page on the transition. Does this card apply to a standalone MariaDB server? No. wsrep_cluster_status only exists when the Galera (wsrep) provider is loaded. A standalone server has no concept of cluster status; for single-server write availability rely on uptime, disk, and connection-error cards instead. During a rolling upgrade one node briefly shows Joining, not Primary. Should I worry? No. A rejoining node passes through Disconnected then Joining (during IST/SST) before reaching Primary. That sequence is the expected recovery path. Only worry if a node stays stuck in Joining for an unusually long time, which usually signals a slow or failing SST.

Tracked live in Vortex IQ Nerve Centre

Galera Cluster Status is one of hundreds of KPI pulses Vortex IQ tracks across MariaDB and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.

​At a glance

​Calculation

​Worked example

​Sibling cards to reference together

​Reconciling against the source

​Known limitations / FAQs

​Tracked live in Vortex IQ Nerve Centre