At a glance
The MariaDB Health Score is a single 0 to 100 composite that rolls up the instance’s most important operational signals (connection-pool saturation, query latency, error rate, replication lag, buffer-pool hit rate, disk usage, and Galera quorum where present) into one number a platform lead can read at a glance. It answers “is the database fundamentally healthy right now, and was it healthy over the last week?” A score of 100 means every input is inside its green band. A score below 70 means at least one critical input has degraded enough to warrant attention before it becomes an incident.
| What it tracks | A weighted composite health index for the MariaDB instance (or Galera cluster) for the selected period. The score blends the sub-metrics that each have their own card, so a drop here always traces back to a specific input. |
| Data source | Derived inside Vortex IQ from the same SHOW GLOBAL STATUS, information_schema, and Galera wsrep_* counters that feed the underlying cards. No separate query: it reuses the polled inputs. |
| Time window | RT/7D. The gauge shows the live composite; the trend shows the rolling 7-day band so you can tell “always been like this” from “degraded this week”. |
| Alert trigger | <70. A composite below 70 flags amber on the Executive Overview and pages the on-call rota if sustained. |
| Calculation basis | Weighted average of normalised sub-scores. Each input is mapped to a 0 to 100 band against its own threshold (for example pool saturation 0% maps to 100, 90%+ maps to 0), then combined by weight. |
| Sensitivity | This is a sensitivity card: the thresholds and input weights are tunable per profile in the Sensitivity tab so the score reflects your own baseline rather than a generic default. |
| What does NOT move it | Cosmetic or non-operational counters (uptime in days, total queries served lifetime) are excluded; they do not indicate health. |
| Roles | owner, engineering, operations |
Calculation
The score is a weighted blend of the instance’s critical operational inputs, each first normalised to a 0 to 100 sub-score against its own threshold, then averaged by weight. The inputs and the direction that hurts the score are:| Input (sibling card) | Source signal | Direction that lowers the score |
|---|---|---|
| Connection pool saturation | Threads_connected / max_connections | Saturation rising toward 90%+ |
| Query error rate | (Aborted_clients + Connection_errors) / Questions | Error rate above 1% |
| Query latency p95 / p99 | statement digests in performance_schema | p95 above 200ms, p99 above 500ms |
| Buffer-pool hit rate | Innodb_buffer_pool_read_requests vs ..._reads | Hit rate below 95% |
| Replication lag | Seconds_Behind_Master / wsrep_local_recv_queue | Lag above 10s |
| Disk usage | data directory free space | Usage above 90% |
| Galera quorum (clustered only) | wsrep_cluster_status, wsrep_cluster_size | Status not Primary, or node count below expected |
Worked example
A platform team runs a 3-node Galera cluster behind a high-traffic Magento storefront. Snapshot taken on 14 Apr 26 at 19:40 BST during an evening promotional push.| Input | Reading | Sub-score | Weight |
|---|---|---|---|
| Connection pool saturation | 78% (climbing) | 55 | high |
| Query error rate | 0.3% | 92 | high |
| Query latency p95 | 240ms (over 200ms band) | 60 | medium |
| Buffer-pool hit rate | 99.1% | 100 | medium |
| Replication / Galera lag | flow control paused 2% | 90 | high |
| Disk usage | 71% | 100 | medium |
| Galera quorum | Primary, size 3/3 | 100 | critical |
- The slip is real, not noise. The 7-day band makes clear this instance normally runs at 88, so a drop to 74 during the promo is a genuine degradation, not the usual evening shape.
- Two inputs are dragging the score. Pool saturation (55) and p95 latency (60) are the culprits; everything else is green. The story is “traffic is pushing the connection pool and queries are starting to queue”, a classic load-driven pattern rather than a fault.
- Action is preventative, not reactive. Because the score is still above 70, the team has runway: raise
max_connectionsheadroom or add a read replica before saturation crosses 90 and the cluster starts refusing connections at checkout.
- A composite is a starting point, never an endpoint. The number tells you “look”, the sub-metric cards tell you “where”. Always drill from the score into the red input before acting.
- Read the gauge with the 7-day trend. A score of 74 means very different things for an instance that normally runs at 75 versus one that normally runs at 92. The trend supplies the baseline.
- One critical input can dominate. A non-Primary Galera state or a disk above 90% can sink the headline on its own regardless of how green the rest is, by design, because those conditions are existential for the database.
Sibling cards
| Card | Why pair it with MariaDB Health Score | What the combination tells you |
|---|---|---|
| Connection Pool Saturation % | Highest-weight load input into the composite. | A low score during a traffic peak almost always traces to rising saturation here. |
| Query Error Rate % | Error-side input. | Score down plus error rate up equals a fault, not just load; investigate failing statements. |
| Query Latency p95 (ms) | Latency input. | Score down plus p95 up equals queries queueing; check slow-query rate and buffer pool. |
| InnoDB / XtraDB Buffer Pool Hit Rate % | Memory-efficiency input. | A falling hit rate drags latency and the composite together; often a sizing problem. |
| Async Replication Lag (seconds) | Replication input. | Lag spikes pull the composite down and threaten read-after-write consistency. |
| Database Disk Usage % | Capacity input with hard ceiling. | Disk above 90% can sink the score alone; a full disk halts writes entirely. |
| Galera Cluster Status | Existential quorum input on clustered instances. | A non-Primary status collapses the composite because the cluster has gone read-only. |
| Queries per Second (live) | Load context (not a direct input). | Read the score against QPS to separate “healthy under load” from “unhealthy at rest”. |
Reconciling against the source
Where to look on the server: There is no single native command that emits a “health score”: it is a Vortex IQ composite. To reconcile, verify each input independently against MariaDB’s own tooling, then confirm the headline moves in step.Why our number may legitimately differ from a hand calculation:SHOW GLOBAL STATUS;for the raw counters (Threads_connected,Aborted_clients,Connection_errors_%,Innodb_buffer_pool_read_requests,Innodb_buffer_pool_reads).SHOW VARIABLES LIKE 'max_connections';to confirm the saturation denominator.SHOW ALL SLAVES STATUS\G(orSHOW REPLICA STATUS\Gon newer builds) for replication lag.SHOW STATUS LIKE 'wsrep_%';for Galera quorum and flow-control inputs.SELECT DIGEST_TEXT, AVG_TIMER_WAIT FROM performance_schema.events_statements_summary_by_digest ORDER BY AVG_TIMER_WAIT DESC;for the latency inputs.
| Reason | Direction | Why |
|---|---|---|
| Profile weights | Variable | The composite uses your configured input weights; a manual unweighted average will differ. |
| Normalisation curves | Variable | Each input is mapped through a band curve, not a linear scale; the midpoint is not 50 for every input. |
| Poll timing | Brief | The composite reuses the last polled value of each input; a sub-metric sampled seconds later can shift the score marginally. |
| Galera presence | Structural | On a non-clustered instance the Galera inputs are dropped and weights re-normalise across the remaining inputs. |
DatabaseConnections, ReplicaLag, CPU and memory) as console metrics. There is no native composite to compare against; reconcile input by input.
Known limitations / FAQs
Why is my health score 74 when every alert is green? The composite turns amber before individual cards cross their hard alert thresholds. A 74 means one or more inputs are in the warning zone (for example pool saturation at 78%, below the 90% alert but well off the green band). That is the point of the score: it gives you runway to act before a sub-metric trips its own alert. The score dropped but I cannot tell which input caused it. Open the Executive Overview and scan the sub-metric cards for the one in amber or red. The composite is always explainable from its inputs; if two inputs moved together (latency and buffer-pool hit rate, say) they usually share a root cause. Use Vortex Mind to trace the upstream cause. Can I change which inputs count and how much they weigh? Yes. This is a sensitivity card. In the Sensitivity tab you can adjust each input’s weight and its threshold band per profile. Teams that run read-heavy reporting replicas often raise the buffer-pool and latency weights; teams on Galera raise the quorum weight to make a non-Primary state dominate. My instance is a single node with no replication or Galera. Does the score still work? Yes. Replication and Galera inputs are dropped and the weights re-normalise across the remaining inputs (saturation, errors, latency, buffer pool, disk). The score is then a clean read on a standalone server. Why does the score sometimes sit at 100 for days then drop sharply? A healthy instance inside every green band scores 100 and stays there until an input crosses into its warning zone. The drop is sharp because crossing a threshold moves that input’s sub-score quickly through the band curve. The 7-day trend makes these step changes easy to spot. Should I page on a score of 69? The<70 trigger is the default amber boundary, not an automatic page. Whether 69 pages depends on your sustained-duration setting in the Sensitivity tab. A momentary dip to 69 during a deploy is normal; a score parked below 70 for several poll cycles is worth waking someone for. Tune the sustained window to your tolerance.
Does a high score guarantee there is no problem?
No. The score only reflects the inputs it measures. A logical fault (a bad migration, a corrupt index, a runaway report query that has not yet pushed latency past its band) can exist at a score of 95. Treat the score as a strong negative signal (low score means definitely investigate) rather than an absolute all-clear.