At a glance
The 95th-percentile query execution time: 95% of statements finished faster than this number, and the slowest 5% took longer. This is the honest measure of how the database feels to its users, because it captures the bad tail without being dragged around by a single pathological outlier the way a maximum is. The median (p50) tells you the typical experience; p95 tells you the experience of the unlucky one-in-twenty, which on a busy instance is thousands of requests an hour.
| What it tracks | The execution time, in milliseconds, below which 95% of statements completed in the window. It measures server-side query time, not network round-trip or client-side time. |
| Data source | Computed from per-statement timing in pg_stat_statements (mean_exec_time, min_exec_time, max_exec_time, stddev_exec_time weighted by calls), or from log-derived durations where log_min_duration_statement is enabled. The percentile is derived across the statement population in the window. |
| Time window | RT/5m (a real-time reading smoothed over a rolling five-minute window). |
| Alert trigger | > 200ms. For an interactive OLTP workload, a 95th-percentile above 200ms means a noticeable share of users are waiting. |
| Roles | owner, engineering, operations |
| Why p95 and not the average | An average hides the tail: a workload with a 5ms mean can still have a 2-second p95 if a slow subset exists, and those slow requests are exactly the ones that hurt. p95 is the standard latency SLO percentile for this reason. |
Calculation
PostgreSQL does not store a true latency histogram per statement, so the 95th percentile is estimated rather than read directly. Vortex IQ uses the best basis available:pg_stat_statements(preferred): the extension records, per normalised statement, thecalls,mean_exec_time,min_exec_time,max_exec_timeandstddev_exec_time. Across the statement population, weighting each statement’s distribution by itscalls, Vortex IQ estimates the call-weighted 95th percentile of execution time. Because the per-statement distribution is summarised (mean and stddev, not a full histogram), the p95 is a well-grounded estimate, not an exact order statistic.- Log-derived durations (more precise when available): with
log_min_duration_statementset to a low value, PostgreSQL logs the duration of every statement above the threshold. Where Vortex IQ has log access it computes the percentile from the actual logged durations, which is exact for the logged population.
Worked example
A platform team runs a PostgreSQL 16 primary behind a product catalogue and search API. The p95 SLO is 200ms. Normal weekday p95 sits around 70 to 110ms. Snapshot taken on 21 Apr 26 at 20:15 BST, during the evening peak.| Window (5m) | p50 | p95 | p99 |
|---|---|---|---|
| 19:55 to 20:00 | 9ms | 88ms | 240ms |
| 20:00 to 20:05 | 10ms | 96ms | 260ms |
| 20:10 to 20:15 | 11ms | 310ms | 1,180ms |
SELECT filtering products by a category_id that, after a recent data load, no longer hits an index, so it now sequentially scans a table that grew from 200k to 2.1m rows. Most page views never run it (hence p50 is unaffected), but the catalogue-filter views do, and those land squarely in the slow 5%.
CREATE INDEX CONCURRENTLY idx_products_category ON products (category_id);. After the index builds, the category filter drops back to single-digit milliseconds and the next window’s p95 returns to 94ms.
Two takeaways the team records:
- Read p50 and p95 together; the gap is the signal. A small p50-to-p95 gap means a uniform workload. A large and growing gap means a slow subset is forming, often a query that lost its index, a table that outgrew its plan, or a lock-contended hot row. The percentile that moves tells you whether the problem is broad or narrow.
- p95 measures execution, not waiting. If p95 looks fine but users complain of slowness, the wait is somewhere this card cannot see: the connection queue (check Connection Pool Saturation %), the network, or the application. A clean p95 with unhappy users points you away from the database engine and toward the path in front of it.
Sibling cards
| Card | Why pair it with Query Latency p95 | What the combination tells you |
|---|---|---|
| Query Latency p50 (ms) | The median, the typical-query view. | A widening p50-to-p95 gap means a slow subset is forming, not a broad slowdown. |
| Query Latency p99 (ms) | The extreme tail. | p99 far above p95 means a small number of very slow queries; p99 close to p95 means the whole tail shifted. |
| Slow-Query Rate % | The share of calls over 100ms. | A rising p95 with a rising slow-query share names the subset that is hurting the tail. |
| Top 10 Slowest Queries | The actual statements behind the tail. | Turns a p95 breach into a named query to fix, usually a missing or stale index. |
| Buffer Cache Hit Rate % | Whether the slow queries are reading from disk. | A falling cache hit rate alongside a rising p95 means cold data is forcing disk reads. |
| Connection Pool Saturation % | The wait this card cannot see. | Clean p95 but slow users means the delay is in the connection queue, not in execution. |
| Queries per Second (live) | The load context for the latency. | Rising p95 with rising QPS is load; rising p95 with flat QPS is a regression. |
| PostgreSQL Health Score | The composite that weights p95 latency. | A sustained p95 breach pulls the composite down through its latency input. |
Reconciling against the source
Where to look in PostgreSQL itself:The richest source isWhy our number may legitimately differ from the console:pg_stat_statements. RunSELECT query, calls, mean_exec_time, min_exec_time, max_exec_time, stddev_exec_time FROM pg_stat_statements ORDER BY mean_exec_time DESC;to see the per-statement timing that the percentile is built from. (Requires thepg_stat_statementsextension.) For exact durations, setlog_min_duration_statementto a low millisecond value and read the logged statement durations directly; this is the most precise basis for a percentile. Note thatpg_stat_statementscounters are cumulative since the lastpg_stat_statements_reset(); a long-running uptime means the figures reflect a long history, so reset to measure a clean window. On a managed service, the console exposes a comparable view: AWS RDS Performance Insights shows per-statement average latency and load, Google Cloud SQL Query Insights surfaces per-query latency, and Azure Database for PostgreSQL exposes Query Store with per-query timing percentiles.
| Reason | Direction | Why |
|---|---|---|
| Estimate vs exact | Either way | From pg_stat_statements, p95 is estimated from per-statement mean and stddev, not a true histogram; log-derived percentiles are exact and may differ slightly. |
| Execution vs total | Vortex IQ lower | This card is execution time only; a console metric that includes connection-acquisition or planning time reads higher. |
| Window vs cumulative | Either way | pg_stat_statements is cumulative since last reset; Vortex IQ differences a rolling window, so a hand read of the raw view (without differencing) reflects a much longer history. |
| Call weighting | Either way | Vortex IQ weights by calls; a console that lists per-query latency unweighted highlights rare slow queries the call-weighted p95 plays down. |
| Mean vs percentile | Console may read lower | Performance Insights often headlines average latency, which sits well below p95 on a tailed workload. |
Known limitations / FAQs
My p50 is fine but p95 breached. What does that pattern mean? A slow subset has formed. The typical query is still fast (so the median holds), but a specific slice, often a query that lost its index, a table that outgrew its plan, or a lock-contended hot row, has become slow and landed in the worst 5%. Pull Top 10 Slowest Queries and Slow-Query Rate % to name the subset. The narrowness of the problem (p50 unaffected) is itself the most useful clue. p95 looks healthy but users say the database is slow. Where is the time going? This card measures execution time only, not waiting. If queries execute fast but users wait, the delay is in front of the engine: the application’s connection queue (check Connection Pool Saturation %), the network round-trip, or the application code. A clean p95 with slow users is a strong signal that the database engine is not the bottleneck. Why is this an estimate and not an exact percentile?pg_stat_statements summarises each statement’s timing as a mean and standard deviation, not a full histogram, so a percentile across the statement population is necessarily estimated. For an exact percentile, enable log_min_duration_statement and let Vortex IQ compute from the logged durations; that population is exact for everything above the log threshold.
Should p99 always be higher than p95?
Yes, by definition: p99 captures a longer tail than p95, so it is always equal to or greater than p95. What matters is the gap. p99 close to p95 means the whole tail shifted together (a broad slowdown). p99 far above p95 means a small number of very slow queries dominate the extreme tail, often a single pathological statement or a periodic batch job.
Does an idle or low-traffic instance give a meaningful p95?
With very few queries in the window, a percentile is noisy: one slow maintenance query can swing it. On a busy instance the percentile is stable because it is computed over thousands of calls. For a quiet database, read p95 alongside Queries per Second (live) so you know how many samples back the number.
The console shows a much lower latency than this card. Why?
Managed-service consoles often headline average latency, which sits well below the 95th percentile on any workload with a slow tail. Compare like with like: look at the per-statement view or Query Store percentiles, not the average, and confirm whether the console figure includes planning and connection time that this execution-only card excludes.
Can I change the 200ms threshold?
Yes, per profile in the Sensitivity tab. 200ms suits an interactive OLTP API; a reporting or analytics workload where multi-second queries are normal would set a far higher threshold, and a latency-critical real-time path might tighten it to 50ms. Set it to your SLO, not the generic default.