At a glance
Database Disk Usage % is the proportion of the data volume that the MariaDB instance has consumed: data files, indexes, binary logs, the InnoDB redo log, temporary files, and the undo log, divided by the total size of the volume holding the data directory. It is the single most consequential capacity metric on the instance, because when a MariaDB data volume fills completely, the server cannot write. Writes fail, replication stalls, and on many builds the server stops accepting transactions entirely until space is freed. This card exists to make sure that never happens by surface.
| What it tracks | Used space as a percentage of total space on the volume that holds the MariaDB data directory (datadir), for the selected period. Includes data, indexes, binary logs, redo and undo logs, and temp files. |
| Data source | The filesystem free/used figures for the datadir volume, cross-referenced with information_schema.TABLES (DATA_LENGTH + INDEX_LENGTH) and binary-log sizing. On managed services, the provider’s storage metric. |
| Time window | RT (real-time, refreshed on each poll). |
| Alert trigger | >90%. Above 90% the card flags red on the Executive Overview, because the remaining headroom can vanish fast under write load or a large binary-log accumulation. |
| Why it matters | A full data volume is one of the few failures that takes a database fully offline for writes. There is no graceful degradation: at 100% the server stops writing. The 90% alert is an early-warning line, not a comfort zone. |
| Sensitivity | Sensitivity card: the alert threshold is tunable per profile, but lowering it below 90 is more common than raising it, since fast-growing instances need more runway. |
| Roles | owner, engineering, operations |
Calculation
The metric isused bytes / total bytes on the filesystem volume that contains the MariaDB datadir, expressed as a percentage. Crucially this is volume-level, not just the size of the tables, because several things share that volume and grow independently:
| Consumer | What it is | Why it can grow unexpectedly |
|---|---|---|
| Table data + indexes | The .ibd files for InnoDB tables | Normal growth, plus bloat from deleted-but-not-reclaimed rows |
| Binary logs | The binlog files used for replication and point-in-time recovery | Accumulate until expire_logs_days / binlog_expire_logs_seconds purges them; a stalled replica can pin them indefinitely |
| InnoDB redo log | The fixed-size write-ahead log | Sized by config, usually stable |
| Undo log | Multi-version concurrency control rollback segments | Can balloon if a very long-running transaction prevents purge |
| Temp files | On-disk temporary tables and sort buffers | Spike during large sorts, ALTER TABLE, or schema migrations |
>90% alert is set where it is because the last 10% can disappear in minutes during a bulk import, a long migration, or a replica outage that pins binary logs. Calculated automatically from your MariaDB data; see the worked example for a typical reading.
Worked example
A platform team runs a MariaDB primary on a 500 GB data volume backing a Shopify-connected order-history warehouse. Snapshot taken on 02 May 26 at 03:15 BST during an overnight batch load.| Consumer | Size | % of volume |
|---|---|---|
| Table data + indexes | 360 GB | 72% |
| Binary logs | 78 GB | 16% |
| Undo + redo + temp | 18 GB | 4% |
| Free | 44 GB | 8% (used = 92%) |
>90% band. The on-call engineer reads three things:
- It is the binary logs, not the tables, that crossed the line. Table growth is steady; the jump came from 78 GB of binary logs. A read replica went offline two days ago and has not reconnected, so the primary is retaining every binary log the replica has not yet consumed. The logs cannot be purged while the replica still needs them.
- The runway is short. At 8% free on a volume taking an overnight batch load, the headroom is hours, not days. If the batch writes another 50 GB before the logs are freed, the volume fills and the server stops writing, taking the order pipeline down.
- There are two valid fixes, one fast and one correct. Fast: extend the volume (trivial on a managed service, a resize on self-hosted). Correct: bring the dead replica back so binary logs purge naturally, or, if the replica is gone for good, drop it from the topology so
binlog_expire_logs_secondscan reclaim the 78 GB. The team does both: extends the volume to buy time, then fixes the replica.
- Disk usage is not the same as table size. Binary logs and undo logs can fill a volume while your tables are flat. Always check what is actually consuming the space before assuming you need a bigger database.
- A pinned binary log is the classic silent filler. An offline or lagging replica stops binary logs from purging, and they grow without bound. Pair this card with the replication cards: a disk climb plus replication lag is almost always binlog retention.
- The last 10% is the dangerous part. Below 90% you have planning time; above 90% you have response time. Treat the alert as “act now”, not “schedule a ticket”, because the consequence of hitting 100% is a write-down outage, not a slowdown.
Sibling cards
| Card | Why pair it with Database Disk Usage | What the combination tells you |
|---|---|---|
| Async Replication Lag (seconds) | Lagging or dead replicas pin binary logs. | Disk climbing plus replication lag equals binlog retention; fix the replica to reclaim space. |
| Last Successful Backup (hours ago) | Backups can need temporary space and a full disk blocks them. | A full disk often coincides with a failed backup; both are capacity emergencies. |
| Memory Usage % | Large on-disk temp files spill from memory pressure. | High memory plus disk growth equals queries spilling to disk; tune sorts and temp tables. |
| MariaDB Health Score | Disk above 90% can sink the composite on its own. | A health-score drop with everything else green usually points straight here. |
| Failover Readiness | A standby that is also low on disk cannot safely take over. | Primary disk high plus standby disk high equals no safe failover target. |
| Slow-Query Rate % | Disk-bound temp tables slow queries. | Disk pressure plus slow queries equals on-disk sorts; the volume is now a performance bottleneck too. |
| Galera Cluster Size | A node that fills its disk drops out of the cluster. | Disk full on one node plus shrinking cluster size equals a node evicted for being out of space. |
Reconciling against the source
Where to look on the server:At the OS level,Why our number may legitimately differ:df -hon the volume holdingdatadiris the ground truth for used-versus-total. This is what the card reports against.SELECT table_schema, ROUND(SUM(data_length + index_length)/1024/1024/1024, 1) AS gb FROM information_schema.TABLES GROUP BY table_schema ORDER BY gb DESC;to attribute table-and-index space by schema.SHOW BINARY LOGS;to list binary logs and their sizes; sum them to see how much of the volume they hold.SHOW VARIABLES LIKE 'binlog_expire_logs_seconds';(orexpire_logs_dayson older builds) to confirm the retention policy that should be purging them.SELECT * FROM information_schema.INNODB_TRX ORDER BY trx_started;to find a long-running transaction that may be inflating the undo log.
| Reason | Direction | Why |
|---|---|---|
| Volume vs table sum | Card higher | The card reports filesystem usage (data + binlogs + logs + temp); summing information_schema.TABLES alone undercounts because it omits binary and transaction logs. |
| Reserved blocks | Card higher | Some filesystems reserve a percentage of blocks for root; df accounts for them, a raw table sum does not. |
| Sparse / fragmented files | Variable | InnoDB tablespace files can hold free pages from deleted rows; on-disk size exceeds live data until OPTIMIZE TABLE reclaims it. |
| Poll timing | Brief | A bulk load between the card poll and your manual df will show different figures. |
FreeStorageSpace CloudWatch metric (Aurora storage auto-scales, RDS does not); on Azure Database for MariaDB it is the Storage percent metric. These provider metrics are the canonical figure on managed instances because you do not have OS shell access to run df.
Known limitations / FAQs
My tables only total 360 GB but the card shows 92% of a 500 GB volume. Where did the rest go? The volume holds more than tables. Binary logs (used for replication and point-in-time recovery), the InnoDB redo and undo logs, and on-disk temporary files all share the data volume. The most common surprise is binary-log accumulation: runSHOW BINARY LOGS; and sum the sizes. If they are large, an offline or lagging replica is usually pinning them.
The disk filled overnight with no change in traffic. How?
Three usual causes: (1) a replica went offline and binary logs stopped purging; (2) a very long-running transaction prevented InnoDB from purging undo-log history, which then grew; (3) a large ALTER TABLE or batch import wrote gigabytes of temporary files. Check SHOW BINARY LOGS;, information_schema.INNODB_TRX, and the temp directory in that order.
What actually happens at 100%?
Writes fail. InnoDB cannot extend tablespaces or write to its logs, so transactions error out, replication on the primary stalls, and on many configurations the server effectively halts write activity until space is freed. Reads may continue for a while, but the instance is functionally down for the application. This is why the alert is at 90, not 98.
How do I free space quickly in an emergency?
In order of speed and safety: (1) extend the volume (instant on managed services, a resize on self-hosted); (2) purge old binary logs with PURGE BINARY LOGS BEFORE ... once you have confirmed no replica still needs them; (3) drop or truncate disposable staging tables; (4) OPTIMIZE TABLE to reclaim space from heavily deleted tables (but this needs temporary space, so do not run it at 99%). Extending the volume is almost always the right first move because it is reversible and buys time.
Can I just raise the alert above 90%?
You can in the Sensitivity tab, but think carefully. The 90% line exists because the last 10% can vanish in minutes under write load or binlog growth. Fast-growing instances usually want a lower threshold for more runway, not a higher one. Raise it only if the volume is large enough that 10% is still many hours of headroom.
Does shrinking a table reclaim disk immediately?
Not necessarily. Deleting rows marks pages free inside the InnoDB tablespace file but does not return the space to the filesystem; the file stays the same size and reuses the free pages for new rows. To return space to the OS you must OPTIMIZE TABLE (which rebuilds the file) or, for whole tables, drop them. This is why the card can show high usage even after a big delete.
Why is the managed-service storage metric slightly different from this card?
On managed services the provider’s storage metric is the canonical figure and the card reports against it. Small differences come from poll timing and from the provider counting some internal overhead (snapshots, WAL on Aurora) that a raw datadir view would not. Treat the provider metric as truth on managed instances and reconcile the card to it.