> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vortexiq.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# PostgreSQL audit profile, Vortex IQ

> What the Vortex IQ PostgreSQL health audit checks: PostgreSQL: keep queries fast, replicas current, capacity ahead of demand

**[Nerve Centre KPIs](/nerve-centre/kpi-cards/postgresql) · [Audit Profile](/nerve-centre/kpi-cards/postgresql/audit) · [Sentiment Settings](/nerve-centre/kpi-cards/postgresql/sentiment)**

PostgreSQL-specific health audit for production database instances. Answers six questions: (1) is stats access correctly provisioned (read-only role, pg\_stat\_statements installed, SSL on managed endpoints); (2) is the instance up and reachable with connection errors bounded; (3) is query latency within the p95 / p99 bands and is the slow-query population trending up; (4) are streaming standbys keeping up and is WAL shipping faster than it accumulates; (5) is the instance ahead on disk, memory, and connection-pool capacity; (6) are backups current and durable enough to meet the recovery objective. Cross-channel area joins query and pool signals to commerce-sibling checkout funnel to size live revenue at risk.

## What this audit checks

### Authentication & access

* Connection string resolves and the monitoring role can read pg\_catalog (USAGE granted, pg\_read\_all\_stats present)
* pg\_stat\_statements extension is installed and present in shared\_preload\_libraries (slow-query KPIs depend on it)
* sslmode=require is set on managed endpoints (RDS / Aurora / Crunchy / Heroku); plaintext flagged
* Monitoring role is read-only - no write or superuser privileges beyond stats access

### Connection & availability

* Instance is reachable and pg\_postmaster\_start\_time() returns a recent, stable uptime (no unplanned restart)
* Connection error count over 24h below the alert band (transient auth / network failures bounded)
* PostgreSQL major version is supported and not past end-of-life from pg\_settings server\_version
* max\_connections from pg\_settings sized with headroom against observed peak backend count

### Query performance

* Query latency p95 within band - mean\_exec\_time aggregate from pg\_stat\_statements under threshold
* Query latency p99 tail not diverging from p95 (no long-tail lock contention)
* Slow-query rate (mean\_exec\_time > 100ms as share of total calls) under threshold
* Buffer cache hit rate (blks\_hit / blks\_hit + blks\_read from pg\_stat\_database) above 95% on OLTP

### Replication & lag

* Streaming replication lag in seconds (replay\_lag from pg\_stat\_replication) under threshold
* WAL bytes pending (pg\_wal\_lsn\_diff primary vs replay\_lsn) not growing sustained - standby keeping up
* Every configured standby has state=streaming - none in catchup, broken, or stopped
* At least one healthy standby with sub-second lag exists for a safe promotion path

### Storage & capacity

* Database disk usage percent under threshold with growth-rate runway before exhaustion
* Connection-pool saturation (active + idle-in-tx / max\_connections) under threshold
* Idle-in-transaction backend count bounded - stuck transactions hold locks and bloat the pool
* Autovacuum current on hot tables - oldest n\_dead\_tup table not unvacuumed past threshold (wraparound risk)

### Backups & durability

* Last successful base backup / WAL archive age within the recovery-point objective
* WAL archiving is active and not failing (archive\_command succeeding, no pile-up)
* Point-in-time recovery window covers the required retention horizon

### Cross-channel: revenue protection

* QPS spike with no concurrent ecom order spike (sibling = shopify / bigcommerce / adobe order rate flat) flagged as bot / scraper load
* Pool saturation > 90% during a commerce-sibling traffic burst (pool exhaustion = lost checkouts)
* Slow-query spike co-occurring with a sibling checkout-completion drop in the same 5-min window
* Inventory-table row drift vs sibling product\_inventory count (oversell risk when DB is source of truth)

## Severity thresholds

| Signal                  | Warn | Critical |
| ----------------------- | ---- | -------- |
| `connection_error_rate` | 1    | 5        |
| `query_p95_ms`          | 200  | 500      |
| `replication_lag_sec`   | 10   | 30       |
| `disk_usage_pct`        | 80   | 90       |
| `slow_query_count`      | 5    | 15       |

## Data sources

* `GET postgresql://{host}:{port}/{database}#pg_settings` - Instance config - version, max\_connections, shared\_buffers, server\_version
* `GET postgresql://{host}:{port}/{database}#pg_stat_activity` - Live backends, states, idle-in-transaction, application\_name for pool saturation
* `GET postgresql://{host}:{port}/{database}#pg_stat_statements` - Normalised query stats - mean\_exec\_time, calls, total\_exec\_time for slow-query + p95/p99
* `GET postgresql://{host}:{port}/{database}#pg_stat_replication` - Standby state, replay\_lag, WAL pending bytes for replication health
* `GET postgresql://{host}:{port}/{database}#pg_stat_database` - Cache hit rate, deadlocks, connection errors per database
* `GET postgresql://{host}:{port}/{database}#pg_stat_user_tables` - Dead tuples + last autovacuum per table for bloat / starvation checks
