At a glance
The 99th-percentile query execution time, in milliseconds, across all statements running on the monitored PostgreSQL instance. p99 is the tail: 99 out of every 100 queries finished faster than this number, and 1 in 100 finished slower. For a DBA or SRE this is the “worst realistic query a real user hits” line. p50 tells you the typical experience; p99 tells you what your slowest customers, your heaviest reports, and your most contended checkout writes actually feel. When p99 climbs while p50 stays flat, you have a tail problem (lock contention, a cold cache, a missing index on one hot path), not a system-wide slowdown.
| Data source | pg_stat_statements, the per-statement execution-time extension. Vortex IQ computes the percentile from the distribution of mean_exec_time weighted by calls, sampled each interval, and on managed services (RDS, Aurora, Cloud SQL) cross-checks against Performance Insights / the provider’s latency metric where available. |
| Metric basis | Statement execution time inside the backend, not round-trip time. Network transit and client-side parsing are excluded; this is the time PostgreSQL itself spent executing. |
| Aggregation window | RT/5m: a real-time headline refreshed roughly every 60 seconds, with the percentile computed over a rolling 5-minute window so a single outlier does not whipsaw the value. |
| Unit | Milliseconds (ms). |
| What counts | Every statement captured by pg_stat_statements on the monitored database, including SELECT, INSERT, UPDATE, DELETE, and DDL, after PostgreSQL’s own query normalisation (literals replaced by $1, $2). |
| What does NOT count | (1) Time spent waiting for a connection from the pool (that is Connection Pool Saturation); (2) network latency between app and database; (3) statements below pg_stat_statements.track granularity if track = top and you care about nested calls; (4) queries that errored out before completing (those land in Query Error Rate %). |
| Time window | RT/5m (real-time headline, percentile over a rolling 5-minute window) |
| Alert trigger | >500ms. Sustained p99 above 500ms means the tail of your workload is breaching the latency budget that most OLTP front ends can absorb without users noticing. |
| Roles | owner, engineering, operations |
Calculation
The card reads frompg_stat_statements, which records cumulative timing per normalised statement since the last pg_stat_statements_reset() (or instance start). Vortex IQ does not trust the lifetime cumulative figure for a real-time tail metric, because a statement that was slow once at 03:00 would pollute the headline all day. Instead the engine samples the view each interval, computes the delta in calls and total_exec_time since the previous sample, and reconstructs a per-statement mean for the window. The 99th percentile is then taken across the call-weighted distribution: statements that ran 50,000 times in the window count 50,000 times toward the percentile, a statement that ran twice counts twice.
In plain terms:
- Mean-of-means is an approximation of the true tail.
pg_stat_statementsstores a mean per statement, not a full histogram, so the p99 is the 99th percentile of statement means, not the 99th percentile of every individual execution. On a workload dominated by a few statement shapes this is very close to the true p99; on a workload with one statement whose individual runs vary wildly, the true tail can be worse than reported. Where the managed-service provider exposes a real latency histogram (Aurora Performance Insights, Cloud SQL Query Insights), Vortex IQ prefers that source and notes it on the card. - Resets reset the baseline. If someone runs
pg_stat_statements_reset()mid-window the delta goes briefly negative; the engine detects the counter rollback and skips that one sample rather than reporting a nonsense spike.
Worked example
A B2B SaaS platform runs its primary on a managed PostgreSQL 15 instance (db.r6g.2xlarge equivalent,max_connections = 400). The application is healthy at p50 but the on-call SRE gets a Nerve Centre alert at 14:20 on 11 May 26: p99 query latency has climbed from a steady 180ms to 640ms and held there for six minutes.
The SRE pulls the snapshot:
| Window | p50 | p95 | p99 | Notes |
|---|---|---|---|---|
| 13:00 to 14:00 (baseline) | 9ms | 70ms | 180ms | normal afternoon load |
| 14:14 to 14:19 | 11ms | 240ms | 640ms | p50 barely moved, tail blew out |
UPDATE invoices SET status = $1 WHERE account_id = $2 statement that normally runs in 4ms now averaging 380ms, and accounting for the spike.
- Read p99 against p50, never alone. p99 of 640ms is alarming on its own, but the diagnostic value is the gap to p50. A wide p50-to-p99 gap means tail contention; a narrow gap that is simply high everywhere means the instance is undersized or starved.
- Lock waits count as execution time. A query blocked on a row lock is still “executing” from
pg_stat_statements’ point of view. That is why a contention event surfaces here and not on the pool-saturation card. Pair with Idle-in-Transaction Backends when p99 spikes, because a forgotten open transaction is the classic cause. - A 500ms p99 is a budget, not a verdict. For an interactive OLTP front end, 500ms at the tail is roughly where users start to feel sluggishness. For an analytics or reporting database, a 500ms p99 may be entirely fine. Tune the sensitivity threshold to your workload in the Sensitivity tab rather than treating the default as gospel.
Sibling cards to read alongside
| Card | Why pair it with Query Latency p99 | What the combination tells you |
|---|---|---|
| Query Latency p50 (ms) | The median, the typical-query view. | p50 flat plus p99 high equals a tail/contention problem; both rising equals a system-wide slowdown. |
| Query Latency p95 (ms) | The “most users” tail, one step in from p99. | If p95 is fine but p99 is bad, the problem is confined to a very small slice of traffic. |
| Slow-Query Rate % | The proportion of queries over 100ms. | High slow-query rate plus high p99 confirms a broad slowdown, not a single outlier. |
| Top 10 Slowest Queries | Names the actual statements behind the tail. | The first place to drill when p99 spikes: which statement shape is responsible. |
| Buffer Cache Hit Rate % | Cache misses force disk reads that lengthen the tail. | p99 up plus cache-hit down equals a cold cache or undersized shared_buffers. |
| Idle-in-Transaction Backends | Stuck transactions hold locks that inflate execution time. | p99 spike with idle-in-tx backends present equals lock contention. |
| Deadlocks (last 5m) | The extreme end of contention. | Deadlocks plus a p99 spike confirm a write-contention pattern. |
| Connection Pool Saturation % | Distinguishes execution time from wait-for-connection time. | p99 high but pool healthy means the slowness is genuinely inside query execution. |
Reconciling against the source
Where to look in PostgreSQL’s own tooling:On managed services:pg_stat_statementsis the authoritative source. Order by mean to see where the tail lives:pg_stat_activityshows what is running right now and what each backend is waiting on (wait_event_type,wait_event), which tells you whether the tail is lock waits or I/O.EXPLAIN (ANALYZE, BUFFERS)on the suspect statement gives the real plan and the actual time.
Amazon RDS / Aurora: Performance Insights exposes per-statement latency and wait-event breakdowns directly, and is the closest native equivalent to this card’s percentile. Google Cloud SQL: Query Insights provides per-query latency and plan sampling. Azure Database for PostgreSQL: Query Store and the Intelligent Performance blade hold the equivalent statement-level timings.Why our number may legitimately differ from a raw
pg_stat_statements read:
| Reason | Direction | Why |
|---|---|---|
| Cumulative vs windowed | Raw view usually looks worse | The view stores lifetime cumulative means; Vortex IQ reports a rolling 5-minute delta, so a one-off slow run at 03:00 inflates the raw view but not the card. |
| Mean-of-means approximation | Card may read slightly lower than true p99 | The view stores a mean per statement, not a full histogram; the call-weighted percentile is an approximation. Where a provider histogram exists, the card prefers it. |
| Statement normalisation | Neither is wrong | pg_stat_statements groups by normalised query; two queries that differ only in literals are one row. The card inherits that grouping. |
| Reset timing | Transient | A pg_stat_statements_reset() mid-window is skipped by the card but visibly zeroes the raw view. |
track = top vs all | Card may miss nested calls | If pg_stat_statements.track = top, time spent in functions and triggers is attributed to the top-level statement, not the nested one. |
| Card | Expected relationship | What causes divergence |
|---|---|---|
datadog.p99-latency | If Datadog APM traces the same database, the two p99s should track within tens of ms. | APM measures from the application’s perspective (includes network and pool wait); this card measures pure execution time, so APM p99 is usually higher. |
| Slow Queries During Checkout Window (5m) | A p99 spike during a checkout window should co-occur. | If p99 spikes but checkout is unaffected, the slow statements are on a non-customer path (reporting, batch). |
Known limitations / FAQs
The card shows a high p99 but every query I run by hand is fast. Why? Three usual causes. First, the tail is intermittent: the slow runs happen under concurrency you cannot reproduce with a single manual query, typically lock contention from another session. Second, your manual run hits a warm cache while the slow production runs hit cold pages (check Buffer Cache Hit Rate %). Third, the slow statement is a different shape than the one you tested; run thepg_stat_statements query above ordered by mean_exec_time to find the real culprit.
Is p99 the right tail to watch, or should I use p95 or max?
p99 is the standard SLO tail for interactive workloads: it captures genuine pain without being dominated by single freak outliers the way max_exec_time is. p95 is more forgiving and good for “most users”; max is too noisy to alert on. Watch p99 for alerting and keep p50, p95 visible for context. The trio together tells you the shape of the curve.
Why does my Datadog or New Relic APM p99 differ from this card?
APM measures latency from the application’s point of view: it includes time waiting for a connection from the pool and network transit to the database. This card measures only the time PostgreSQL spent executing the statement. APM p99 is therefore almost always higher. A large gap between the two points at pool or network problems rather than query problems.
pg_stat_statements is not installed. Does this card work?
No, this card requires the pg_stat_statements extension (it ships with PostgreSQL but must be added to shared_preload_libraries and created with CREATE EXTENSION pg_stat_statements). On managed services it is usually available as a parameter-group toggle. Without it, Vortex IQ falls back to the provider’s native latency metric (Performance Insights, Query Insights) where present, and otherwise the card is unavailable. The connector setup screen flags this during onboarding.
Does p99 include time the query spent waiting for a lock?
Yes. From pg_stat_statements’ perspective a statement blocked on a row or table lock is still executing, so lock-wait time is included in execution time and therefore in p99. This is intentional and useful: it means contention shows up here. Pair with Idle-in-Transaction Backends and Deadlocks (last 5m) to confirm contention as the cause.
The default 500ms alert is too sensitive (or not sensitive enough) for my workload.
Adjust it in the Sensitivity tab per profile. An interactive OLTP front end might want 250ms; a reporting or analytics database might be perfectly healthy at a 2,000ms p99. The 500ms default is a reasonable OLTP starting point, not a universal truth. Set the threshold against your own latency budget.
Can a single very slow query move the p99, or does it take many?
Because the percentile is call-weighted, one statement that runs rarely will barely move p99 even if each run is slow. To move p99 you need a statement shape that runs often enough to occupy the top 1% of the call-weighted distribution. That is by design: it stops a single nightly maintenance query from triggering false alarms while still catching a hot path that has regressed.