At a glance
The number of connection-level errors MySQL has logged in the last 24 hours: clients that failed to establish or complete a connection before reaching the SQL layer. For a DBA, this is “how many times did something try to connect and get rejected, dropped, or timed out?” A handful per day is normal background noise (health checks, the occasional flaky client). A spike usually means one of three things: wrong credentials being retried in a loop, a network path that is dropping packets, or a host that has tripped max_connect_errors and been blocked.
| Data source | SHOW GLOBAL STATUS connection-error counters, summed and deltaed over the window: Aborted_connects, Connection_errors_max_connections, Connection_errors_internal, Connection_errors_peer_address, Connection_errors_accept, Connection_errors_tcpwrap, plus the Aborted_clients counter for connections dropped mid-session. |
| Metric basis | A delta, not a lifetime total. The engine snapshots the counters, stores them, and reports the increase over the trailing 24 hours. MySQL’s raw counters only ever climb until the server restarts, so the card never shows the since-boot figure. |
| Aggregation window | Trailing 24 hours, recomputed on the standard refresh. The headline is a single running 24h count, not a per-period bucket. |
| What counts as an error | (1) Connections aborted before the handshake completed (bad credentials, no database, denied host); (2) connections refused because max_connections was reached; (3) accept()/TCP-wrapper/peer-address failures at the network layer; (4) clients that connected then died without a clean mysql_close(). |
| What does NOT count | (1) Successful connections that later ran a failing query (that is the Query Error Rate card); (2) slow connections that eventually succeeded; (3) replica IO-thread reconnects, which are tracked under Replication Thread Health. |
| Managed-service note | On Amazon RDS / Aurora the same counters are exposed via SHOW GLOBAL STATUS and surfaced as the AbortedClients and LoginFailures CloudWatch metrics. On Cloud SQL they appear in the mysql.aborted_connects metric. The card reads the live counters directly so it matches the engine, not the cloud roll-up. |
| Time window | 24h (trailing 24 hours, recomputed on refresh) |
| Alert trigger | > 100 connection errors in the trailing 24h window pages the on-call DBA. |
| Roles | owner, engineering, operations |
Calculation
The engine pollsSHOW GLOBAL STATUS and keeps a rolling store of the connection-error counters. Because every counter in MySQL is monotonic (it only resets on server restart), the card reports a delta over the window rather than the raw value:
Aborted_clients (mid-session drops) is included because in practice it is the noisiest sub-signal: a client library that does not call mysql_close() cleanly increments it on every request. The breakdown by sub-counter is available on the expanded card so you can see whether the spike is auth, capacity, or network.
Worked example
A platform team runs a primary MySQL 8.0 instance behind a PgBouncer-style proxy for a high-traffic order service. Snapshot taken on 14 Apr 26 at 09:15 BST. The card jumps from its usual baseline of around 12/day to 418 in 24h, well past the> 100 alert.
The expanded breakdown shows where the errors came from:
| Sub-counter | 24h delta | Reading |
|---|---|---|
Aborted_connects | 372 | The overwhelming majority: handshakes that failed before authentication completed. |
Connection_errors_max_connections | 0 | Capacity is fine; nothing was turned away for being full. |
Aborted_clients | 41 | Background noise, roughly the normal rate. |
Connection_errors_peer_address | 5 | Marginal, a few reverse-DNS hiccups. |
Access denied for user 'reporting'@'10.0.4.12' entry. A scheduled analytics job had its password rotated in the secrets store the night before but the cron host was still using the cached old value, retrying every 30 seconds.
- Aborted connects are almost always a client problem, not a server problem. The server is doing exactly what it should (rejecting bad credentials). The fix lives in whatever is connecting, not in
my.cnf. - Always read the sub-counter breakdown. “418 errors” could be an auth loop, a capacity wall, or a flaky network, and the three demand completely different responses. Capacity errors point at Connection Pool Saturation; auth errors point at a client config; network errors point at the load balancer or security group.
- One misbehaving host can dominate the count. A single stale credential retrying in a tight loop produces hundreds of errors a day. The headline looks alarming but the blast radius is one host. Confirm scope before declaring an incident.
Sibling cards to reference together
| Card | Why pair it with Connection Errors (24h) | What the combination tells you |
|---|---|---|
| Aborted Connects (24h) | The single largest sub-counter, broken out on its own. | If this card is high and Aborted Connects is the bulk of it, the problem is authentication or denied hosts, not capacity. |
| Connection Pool Saturation % | The capacity-side cause. | Connection errors plus saturation near 100% equals max_connections rejections, not bad credentials. Raise the limit or fix the leak. |
| Connections In Use | The live count of established sessions. | Errors climbing while in-use sits flat equals attempts that never establish (auth/network), not a busy server. |
| Connection Pool at >90% Saturation | The real-time alert that fires when the pool is full. | If this alert is silent, your connection errors are not capacity-driven. |
| Query Error Rate % | The next layer up: errors after a connection succeeds. | Connection errors flat but query errors high equals the clients are getting in but their SQL is failing. Different team, different fix. |
| Instance Uptime | Detects a restart inside the window. | A recent restart resets the counters and can make the 24h delta look artificially small. |
| MySQL Health Score | The executive composite that folds connection errors into its weighting. | A connection-error spike alone can pull the composite down even when query latency is healthy. |
Reconciling against the source
Where to look in MySQL’s own tooling:RunWhy our number may legitimately differ from a raw status query:SHOW GLOBAL STATUS LIKE 'Aborted%';andSHOW GLOBAL STATUS LIKE 'Connection_errors%';on the instance for the live counters. Remember these are since-restart totals, not 24h deltas, so they will be larger than the card. CheckUptimein the same status output to know how far back the counters reach:SHOW GLOBAL STATUS LIKE 'Uptime';. Tail the error log (SELECT @@log_error;to find the path) for the actualAccess denied/Aborted connectionlines, which name the offending user and host.
| Reason | Direction | Why |
|---|---|---|
| Delta vs lifetime total | Card is much lower | SHOW GLOBAL STATUS reports the since-restart total; the card reports only the trailing 24h increase. On a long-running instance the raw figure can be in the millions. |
| Server restart in window | Card may look low | A restart zeroes the counters. The engine clamps the delta to the post-restart value so it never goes negative, which can understate the true 24h figure across the restart boundary. |
| Counter set | Card may be higher | The card sums several connection-error counters plus Aborted_clients; checking only Aborted_connects undercounts. |
| Refresh cadence | Brief lag | The card recomputes on the standard refresh, so a burst in the last few minutes may not yet be reflected. Force a refresh for the live figure. |
| Platform | Where to confirm | Note |
|---|---|---|
| Amazon RDS / Aurora | CloudWatch AbortedClients and LoginFailures metrics, plus the RDS error log in the console. | CloudWatch aggregates per-minute; sum across 24h to compare to the card’s delta. |
| Google Cloud SQL | Cloud Monitoring mysql.aborted_connects metric. | Same delta concept; align the window to the trailing 24h. |
| Self-managed | SHOW GLOBAL STATUS plus the error log directly. | The closest 1:1 source; just remember to subtract the 24h-ago snapshot yourself. |
Known limitations / FAQs
Why is my rawSHOW GLOBAL STATUS number millions but the card says 200?
The raw counter is a lifetime total since the last server restart. On an instance that has been up for months, Aborted_connects alone can reach seven figures. The card deliberately reports only the increase over the trailing 24 hours so the number stays actionable. To reconcile, take two snapshots 24 hours apart and subtract.
The card spiked but my application looks fine. Is this a false alarm?
Often, partly. A single misconfigured client (a stale cron credential, a health-check probe hitting the wrong port, a load balancer doing TCP-only checks) can generate hundreds of aborted connects a day without affecting real user traffic. Read the sub-counter breakdown and the error log to find the source host before treating it as an incident. That said, a sustained climb is worth fixing because it pollutes the log and masks real auth failures.
What is the difference between Aborted_connects and Aborted_clients?
Aborted_connects counts connections that failed before the handshake completed: bad password, no such database, host denied, protocol mismatch. Aborted_clients counts connections that succeeded but then died without a clean close: the client crashed, hit a network timeout, or its library skipped mysql_close(). The first points at credentials/permissions; the second points at client lifecycle or network stability.
Does hitting max_connections show up here?
Yes, via the Connection_errors_max_connections sub-counter. If that is the dominant component, the problem is capacity, not credentials. Pair with Connection Pool Saturation % and consider raising max_connections, adding a connection pooler, or fixing a leak that holds sessions open.
Could max_connect_errors be blocking legitimate hosts?
Yes. If a host accumulates more than max_connect_errors (default 100) failed connections without a single success, MySQL blocks it until you run FLUSH HOSTS (or mysqladmin flush-hosts, or TRUNCATE performance_schema.host_cache). A blocked legitimate host that keeps retrying will keep incrementing the error counters. If you see a host suddenly unable to connect at all, check whether it has been blocked, fix the underlying cause, then flush.
Does this include replica reconnections?
No. A replica’s IO thread reconnecting to its source is tracked separately and surfaced on Replication Thread Health (IO/SQL) and Replication Lag. Those reconnects can increment some counters at the network layer, but the card’s headline is dominated by client-side connection attempts, not replication internals.
Why did the count drop sharply with no action from me?
Most likely the server restarted, which zeroes every status counter, or the offending client was redeployed and stopped retrying. A restart also shows up on Instance Uptime resetting to a low value. After a restart the 24h delta rebuilds from zero, so a few hours of low readings is expected even if the underlying problem is unresolved.
Can I change the threshold from 100?
Yes. The > 100 trigger is the default sensitivity. A small instance with a couple of well-behaved clients might alert at 25; a large multi-tenant fleet with aggressive health checks might set it to 500 to cut noise. Adjust it per profile in the Sensitivity tab so the alert reflects your own baseline rather than the generic default.