Query Latency p99 (ms), PostgreSQL - Vortex IQ Help Centre

Card class: Hero • Category: Performance

At a glance

The 99th-percentile query execution time, in milliseconds, across all statements running on the monitored PostgreSQL instance. p99 is the tail: 99 out of every 100 queries finished faster than this number, and 1 in 100 finished slower. For a DBA or SRE this is the “worst realistic query a real user hits” line. p50 tells you the typical experience; p99 tells you what your slowest customers, your heaviest reports, and your most contended checkout writes actually feel. When p99 climbs while p50 stays flat, you have a tail problem (lock contention, a cold cache, a missing index on one hot path), not a system-wide slowdown.


Data source	`pg_stat_statements`, the per-statement execution-time extension. Vortex IQ computes the percentile from the distribution of `mean_exec_time` weighted by `calls`, sampled each interval, and on managed services (RDS, Aurora, Cloud SQL) cross-checks against Performance Insights / the provider’s latency metric where available.
Metric basis	Statement execution time inside the backend, not round-trip time. Network transit and client-side parsing are excluded; this is the time PostgreSQL itself spent executing.
Aggregation window	`RT/5m`: a real-time headline refreshed roughly every 60 seconds, with the percentile computed over a rolling 5-minute window so a single outlier does not whipsaw the value.
Unit	Milliseconds (ms).
What counts	Every statement captured by `pg_stat_statements` on the monitored database, including SELECT, INSERT, UPDATE, DELETE, and DDL, after PostgreSQL’s own query normalisation (literals replaced by `$1`, `$2`).
What does NOT count	(1) Time spent waiting for a connection from the pool (that is Connection Pool Saturation); (2) network latency between app and database; (3) statements below `pg_stat_statements.track` granularity if `track = top` and you care about nested calls; (4) queries that errored out before completing (those land in Query Error Rate %).
Time window	`RT/5m` (real-time headline, percentile over a rolling 5-minute window)
Alert trigger	`>500ms`. Sustained p99 above 500ms means the tail of your workload is breaching the latency budget that most OLTP front ends can absorb without users noticing.
Roles	owner, engineering, operations

Calculation

The card reads from pg_stat_statements, which records cumulative timing per normalised statement since the last pg_stat_statements_reset() (or instance start). Vortex IQ does not trust the lifetime cumulative figure for a real-time tail metric, because a statement that was slow once at 03:00 would pollute the headline all day. Instead the engine samples the view each interval, computes the delta in calls and total_exec_time since the previous sample, and reconstructs a per-statement mean for the window. The 99th percentile is then taken across the call-weighted distribution: statements that ran 50,000 times in the window count 50,000 times toward the percentile, a statement that ran twice counts twice. In plain terms:

For the rolling 5-minute window:
  per_statement_mean_ms = (delta total_exec_time) / (delta calls)
  Build a distribution where each statement contributes `delta calls` samples
    at value `per_statement_mean_ms`.
  p99 = the value at the 99th percentile of that call-weighted distribution.

Two engineering notes that affect the reading:

Mean-of-means is an approximation of the true tail. pg_stat_statements stores a mean per statement, not a full histogram, so the p99 is the 99th percentile of statement means, not the 99th percentile of every individual execution. On a workload dominated by a few statement shapes this is very close to the true p99; on a workload with one statement whose individual runs vary wildly, the true tail can be worse than reported. Where the managed-service provider exposes a real latency histogram (Aurora Performance Insights, Cloud SQL Query Insights), Vortex IQ prefers that source and notes it on the card.
Resets reset the baseline. If someone runs pg_stat_statements_reset() mid-window the delta goes briefly negative; the engine detects the counter rollback and skips that one sample rather than reporting a nonsense spike.

Worked example

A B2B SaaS platform runs its primary on a managed PostgreSQL 15 instance (db.r6g.2xlarge equivalent, max_connections = 400). The application is healthy at p50 but the on-call SRE gets a Nerve Centre alert at 14:20 on 11 May 26: p99 query latency has climbed from a steady 180ms to 640ms and held there for six minutes. The SRE pulls the snapshot:

Window	p50	p95	p99	Notes
13:00 to 14:00 (baseline)	9ms	70ms	180ms	normal afternoon load
14:14 to 14:19	11ms	240ms	640ms	p50 barely moved, tail blew out

The fact that p50 is flat while p99 quadrupled is the whole story. This is not a CPU-saturation event (that would lift the whole curve, p50 included). It is a tail event: a small subset of queries got very slow. The SRE drills into Top 10 Slowest Queries and finds a single UPDATE invoices SET status = $1 WHERE account_id = $2 statement that normally runs in 4ms now averaging 380ms, and accounting for the spike.

Root-cause framing:
  - p50 flat            -> the database as a whole is fine (CPU, memory, cache healthy)
  - p99 spiked          -> a narrow set of statements is slow
  - one UPDATE to blame -> lock contention, not a missing index

Investigation: the team had deployed a batch job at 14:13 that updated
invoices for every account in a long transaction, holding row locks.
Concurrent single-row UPDATEs from the web tier queued behind those locks.
That lock wait shows up as execution time, not pool wait, which is why p99
moved and Connection Pool Saturation did not.

The fix was to chunk the batch job into 500-row commits instead of one giant transaction. p99 dropped back to 190ms within two minutes of the long transaction committing. Three takeaways the team recorded:

Read p99 against p50, never alone. p99 of 640ms is alarming on its own, but the diagnostic value is the gap to p50. A wide p50-to-p99 gap means tail contention; a narrow gap that is simply high everywhere means the instance is undersized or starved.
Lock waits count as execution time. A query blocked on a row lock is still “executing” from pg_stat_statements’ point of view. That is why a contention event surfaces here and not on the pool-saturation card. Pair with Idle-in-Transaction Backends when p99 spikes, because a forgotten open transaction is the classic cause.
A 500ms p99 is a budget, not a verdict. For an interactive OLTP front end, 500ms at the tail is roughly where users start to feel sluggishness. For an analytics or reporting database, a 500ms p99 may be entirely fine. Tune the sensitivity threshold to your workload in the Sensitivity tab rather than treating the default as gospel.

Sibling cards to read alongside

Card	Why pair it with Query Latency p99	What the combination tells you
Query Latency p50 (ms)	The median, the typical-query view.	p50 flat plus p99 high equals a tail/contention problem; both rising equals a system-wide slowdown.
Query Latency p95 (ms)	The “most users” tail, one step in from p99.	If p95 is fine but p99 is bad, the problem is confined to a very small slice of traffic.
Slow-Query Rate %	The proportion of queries over 100ms.	High slow-query rate plus high p99 confirms a broad slowdown, not a single outlier.
Top 10 Slowest Queries	Names the actual statements behind the tail.	The first place to drill when p99 spikes: which statement shape is responsible.
Buffer Cache Hit Rate %	Cache misses force disk reads that lengthen the tail.	p99 up plus cache-hit down equals a cold cache or undersized `shared_buffers`.
Idle-in-Transaction Backends	Stuck transactions hold locks that inflate execution time.	p99 spike with idle-in-tx backends present equals lock contention.
Deadlocks (last 5m)	The extreme end of contention.	Deadlocks plus a p99 spike confirm a write-contention pattern.
Connection Pool Saturation %	Distinguishes execution time from wait-for-connection time.	p99 high but pool healthy means the slowness is genuinely inside query execution.

Reconciling against the source

Where to look in PostgreSQL’s own tooling:

pg_stat_statements is the authoritative source. Order by mean to see where the tail lives:
SELECT queryid, calls, mean_exec_time, max_exec_time,
       total_exec_time, query
FROM pg_stat_statements
ORDER BY mean_exec_time DESC
LIMIT 20;
pg_stat_activity shows what is running right now and what each backend is waiting on (wait_event_type, wait_event), which tells you whether the tail is lock waits or I/O. EXPLAIN (ANALYZE, BUFFERS) on the suspect statement gives the real plan and the actual time.

On managed services:

Amazon RDS / Aurora: Performance Insights exposes per-statement latency and wait-event breakdowns directly, and is the closest native equivalent to this card’s percentile. Google Cloud SQL: Query Insights provides per-query latency and plan sampling. Azure Database for PostgreSQL: Query Store and the Intelligent Performance blade hold the equivalent statement-level timings.

Why our number may legitimately differ from a raw pg_stat_statements read:

Reason	Direction	Why
Cumulative vs windowed	Raw view usually looks worse	The view stores lifetime cumulative means; Vortex IQ reports a rolling 5-minute delta, so a one-off slow run at 03:00 inflates the raw view but not the card.
Mean-of-means approximation	Card may read slightly lower than true p99	The view stores a mean per statement, not a full histogram; the call-weighted percentile is an approximation. Where a provider histogram exists, the card prefers it.
Statement normalisation	Neither is wrong	`pg_stat_statements` groups by normalised query; two queries that differ only in literals are one row. The card inherits that grouping.
Reset timing	Transient	A `pg_stat_statements_reset()` mid-window is skipped by the card but visibly zeroes the raw view.
`track = top` vs `all`	Card may miss nested calls	If `pg_stat_statements.track = top`, time spent in functions and triggers is attributed to the top-level statement, not the nested one.

Cross-connector reconciliation:

Card	Expected relationship	What causes divergence
`datadog.p99-latency`	If Datadog APM traces the same database, the two p99s should track within tens of ms.	APM measures from the application’s perspective (includes network and pool wait); this card measures pure execution time, so APM p99 is usually higher.
Slow Queries During Checkout Window (5m)	A p99 spike during a checkout window should co-occur.	If p99 spikes but checkout is unaffected, the slow statements are on a non-customer path (reporting, batch).

Known limitations / FAQs

The card shows a high p99 but every query I run by hand is fast. Why? Three usual causes. First, the tail is intermittent: the slow runs happen under concurrency you cannot reproduce with a single manual query, typically lock contention from another session. Second, your manual run hits a warm cache while the slow production runs hit cold pages (check Buffer Cache Hit Rate %). Third, the slow statement is a different shape than the one you tested; run the pg_stat_statements query above ordered by mean_exec_time to find the real culprit. Is p99 the right tail to watch, or should I use p95 or max? p99 is the standard SLO tail for interactive workloads: it captures genuine pain without being dominated by single freak outliers the way max_exec_time is. p95 is more forgiving and good for “most users”; max is too noisy to alert on. Watch p99 for alerting and keep p50, p95 visible for context. The trio together tells you the shape of the curve. Why does my Datadog or New Relic APM p99 differ from this card? APM measures latency from the application’s point of view: it includes time waiting for a connection from the pool and network transit to the database. This card measures only the time PostgreSQL spent executing the statement. APM p99 is therefore almost always higher. A large gap between the two points at pool or network problems rather than query problems. pg_stat_statements is not installed. Does this card work? No, this card requires the pg_stat_statements extension (it ships with PostgreSQL but must be added to shared_preload_libraries and created with CREATE EXTENSION pg_stat_statements). On managed services it is usually available as a parameter-group toggle. Without it, Vortex IQ falls back to the provider’s native latency metric (Performance Insights, Query Insights) where present, and otherwise the card is unavailable. The connector setup screen flags this during onboarding. Does p99 include time the query spent waiting for a lock? Yes. From pg_stat_statements’ perspective a statement blocked on a row or table lock is still executing, so lock-wait time is included in execution time and therefore in p99. This is intentional and useful: it means contention shows up here. Pair with Idle-in-Transaction Backends and Deadlocks (last 5m) to confirm contention as the cause. The default 500ms alert is too sensitive (or not sensitive enough) for my workload. Adjust it in the Sensitivity tab per profile. An interactive OLTP front end might want 250ms; a reporting or analytics database might be perfectly healthy at a 2,000ms p99. The 500ms default is a reasonable OLTP starting point, not a universal truth. Set the threshold against your own latency budget. Can a single very slow query move the p99, or does it take many? Because the percentile is call-weighted, one statement that runs rarely will barely move p99 even if each run is slow. To move p99 you need a statement shape that runs often enough to occupy the top 1% of the call-weighted distribution. That is by design: it stops a single nightly maintenance query from triggering false alarms while still catching a hot path that has regressed.

Tracked live in Vortex IQ Nerve Centre

Query Latency p99 (ms) is one of hundreds of KPI pulses Vortex IQ tracks across PostgreSQL and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.

​At a glance

​Calculation

​Worked example

​Sibling cards to read alongside

​Reconciling against the source

​Known limitations / FAQs

​Tracked live in Vortex IQ Nerve Centre