At a glance
Percentage of Transaction events that ended in an error in the rolling window. The fastest leading-edge signal of a deploy gone wrong, an upstream dependency crashing, or a database hitting connection-pool exhaustion. The card a duty engineer should pin to a second monitor.
| What it counts | errorCount / count(*) FROM Transaction x 100, expressed as a percentage. An error is any Transaction event with error IS true (set by the APM agent on uncaught exceptions, 5xx responses, and explicit noticeError() calls). |
| NerdGraph endpoint | NRQL via NerdGraph: SELECT percentage(count(*), WHERE error IS true) FROM Transaction WHERE appName IN (...) SINCE 5 MINUTES AGO. |
| Metric basis | Event-rate (not request-rate). Async background work, scheduled jobs, and message-consumer transactions all contribute equally to a foreground HTTP transaction. To restrict to user-facing only, filter WHERE transactionType = 'Web'. |
| Aggregation window | 5-minute rolling for the live KPI; 1-hour rollup on the trend chart; 1-day rollup for the 7D vsP comparison. |
| Severity threshold | All errors counted equally by default. To prioritise P1-only paths, scope the NRQL to WHERE name LIKE 'Controller/Checkout/%' or your equivalent. |
| Browser vs APM scope | This card is APM-only (server-side). For client-side errors see JS Errors / Session which reads JavaScriptError events from Browser. APM error rate of 0.5% with Browser JS errors at 8% means the issue is client-side (a third-party script, a CDN edge, or a CORS rule). |
| Filtered hosts / services | appName IN (...) is set per-merchant during onboarding to the apps that matter (typically the storefront app, checkout app, and payment service). Background workers and admin tools are excluded by default. |
| Sample basis | NRQL on Transaction is sample-corrected for accounts in event sampling. The percentage stays accurate; the absolute error count may be lower than what server logs show. |
| Time zone | Account-configured timezone for chart axes; UTC for raw event timestamps. |
| Time window | T/7D vsP (today vs prior 7-day average for the same time-of-day) |
| Alert trigger | >2%, calibrated to “above the normal day-to-day noise floor of 0.5, 1% on a healthy storefront”. Tune up to 3% for B2B integrations where partner-side timeouts inflate the baseline. |
| Sentiment key | error_rate |
| Roles | owner, engineering, operations |
Calculation
Calculated automatically from your New Relic data. See the At a glance summary above for what the metric tracks and the worked example below for a typical reading.Worked example
A BigCommerce storefront on Cloud Run is showing this card going from 0.6% to 4.1% between 14:05 and 14:20 on 02 May 26. The duty engineer pulls this card up first.| Time | NRQL result | Interpretation |
|---|---|---|
| 14:00, 14:05 | 0.6% (60 errors / 9,800 txns) | Normal noise floor, no action |
| 14:05, 14:10 | 1.4% (155 errors / 11,200 txns) | Above noise floor, watch for sustained breach |
| 14:10, 14:15 | 2.8% (348 errors / 12,400 txns) | Alert fires (over 2% threshold) |
| 14:15, 14:20 | 4.1% (525 errors / 12,800 txns) | Sustained, escalate |
525 x 0.12 x £85 = £5,355 of cart-add value lost per 5-minute window, or roughly £1,070/min of risked GMV. If the rollback takes 25 minutes from alert to deploy, the total exposure is ~£26.7k for a single regression.
The Apdex calibration interaction matters here: with t = 0.5s, errors that take 8s to surface (the 503 timeout) push customers into the frustrated bucket (4 x t = 2s, anything above is frustrated). So Apdex drops from 0.91 to 0.74 at the same time the error rate climbs. The composite Operational Health Score drops from 87 to 68, breaching its 70 alert threshold roughly 2 minutes before the error rate alone would.
If the card stayed at 0.6%, the conversation would be different: 0.6% of 12,800 is 77 errors in 5 minutes. Most are background-worker timeouts and 404 from bot crawlers; the customer-facing impact is negligible. Same number, different story, hence the rule “look at trend not absolute level”.
Sibling cards merchants should reference together
| Card | Why pair it with Error Rate |
|---|---|
| 5xx Response Rate | Subset view: only HTTP 5xx errors. When error rate spikes but 5xx rate doesn’t, the cause is application-level exceptions, not infrastructure. |
| Errors by Transaction | The drill-down. Tells you which endpoint is failing. Open this immediately when error rate breaches threshold. |
| Top Error Classes | Groups by exception class so you can see whether one root cause is driving most of the volume. |
| New Error Types (last 24h) | Detects errors that didn’t exist yesterday, the strongest deploy-regression signal. |
| Apdex Score | Companion latency-and-satisfaction view. Error rate up + Apdex down = customer-facing problem; error rate up + Apdex stable = background-only problem. |
| Datadog Error Rate | Cross-connector peer. The two should agree within 0.3% during normal periods; gaps over 1% point at probe coverage drift. |
| GA4 Conversion Rate | Customer-side outcome. Sustained 3%+ error rate on commerce paths typically drops conversion 15, 30%. |
| Shopify Sales / Min | Revenue-side outcome. Watch this co-move with error rate during incidents to quantify revenue at risk. |
Reconciling against the vendor’s own dashboard
Where to look in New Relic:- APM > Application > Error analytics for the same NRQL chart in New Relic’s native UI.
- Dashboards > pre-built “Error analytics” dashboard.
- Alerts & AI > Conditions to see the alert configuration that backs this card’s threshold.
appName filters.
Why our number may legitimately differ from New Relic’s own screens:
| Reason | Direction of divergence |
|---|---|
| Account timezone vs UTC. NR APM chart axes follow the account timezone; Vortex IQ NRQL runs in UTC. Boundary-period rollups can differ by 0.1, 0.3% on the most recent hour slice. | Either direction near hour boundaries |
NRQL retention windows. Full-resolution Transaction events are retained 8 days on standard plans, 13 months on Data Plus. Beyond the retention window, queries return aggregated data with slightly coarser percentage rounding. | Vortex IQ may show a 0.05, 0.1% drift on >7-day windows |
Ingest sampling. NR samples Transaction events on high-cardinality accounts; the percentage stays correct (sample-corrected) but the count differs from raw server logs. | Counts in Vortex IQ < server logs; rates match |
| NerdGraph rate limits. Default 3,000 NRQL points / minute / account. Heavy investigation can stale the card by 30, 60s. | Stale, not wrong |
Filter scope. Vortex IQ scopes appName to merchant-relevant apps; NR’s all-apps default chart includes background workers and admin tools, which can read at 5, 10x the merchant-facing rate without it being a customer-facing problem. | Vortex IQ < NR raw |
noticeError() calls, DD counts unhandled exceptions plus configured 5xx).
Known limitations / merchant FAQs
New Relic and Datadog disagree on error rate by 1.2%, who’s right? Probably both, on different scopes. The most common cause is one platform instrumenting a route the other doesn’t (a server-sent-events endpoint, a websocket handler, a framework error filter that swallows exceptions before NR sees them). Pick a 5-minute window where the gap is largest and run both queries with the sameappName filter; if the gap persists, audit each agent’s coverage. A persistent 1%+ gap on otherwise identical scope is rare and worth a support ticket on whichever platform looks anomalous.
Why is my error rate 0% but my customers report errors?
Three usual causes: (a) the errors are client-side JavaScript, not server-side, see JS Errors / Session; (b) the application is catching exceptions and not calling noticeError(), so APM never sees them, audit the framework’s error middleware; (c) the failure is a 4xx (validation, auth) which NR’s default config classifies as “expected” rather than “error”. Add error_collector.expected_classes overrides if 4xx represents real customer pain.
My error rate jumps every Sunday at 02:00, is something broken?
Almost certainly a scheduled background job. Check whether your APM scope includes transactionType = 'Other' (NR’s classification for non-web work). If yes, weekly batch jobs (DB cleanup, report generation) can produce concentrated error volume that doesn’t affect customers. Restrict the card to WHERE transactionType = 'Web' to filter background work out.
Apdex math: how do errors interact with Apdex?
Errors automatically count as frustrated in Apdex calculation regardless of duration. So a 200ms error is treated identically to an 8s timeout in the Apdex bucket assignment. This is why a sudden error-rate spike often drops Apdex below 0.5 even when latency p95 looks unchanged: the frustrated bucket fills with fast-failing requests. NR’s Apdex doc covers the rule.
NRQL retention is 8 days but my error trend chart shows 30 days, how?
NR rolls up Transaction data into hourly aggregates after the full-resolution retention window. The 30-day trend chart you see uses the rolled aggregates beyond day 8. Counts and rates remain accurate; what you lose is the ability to drill into a specific minute past day 8.
Sampling: am I missing real errors?
On accounts in event sampling, NR keeps a representative subset. Rates are sample-corrected so the percentage you see is accurate; the raw error count may be lower than what your server logs show. If you need raw counts (e.g., for billing reconciliation), pair this card with log-side counts from Error-level Log Rate, which is unaffected by event sampling.
Multi-account: I have a US and EU NR account, can I see one combined number?
Vortex IQ reads one NR account per integration. To see the combined number, connect both accounts as separate integrations and stack them in the Nerve Centre (the stack panels feature averages by transaction count, not arithmetic average, so the combined rate is correctly weighted).
Ingest cost vs visibility tradeoff: is there a way to keep accuracy and reduce cost?
Yes. Drop sample rate on non-checkout transactions to 25%, keep checkout at 100%, and keep all error events at 100% (errors should never be sampled, they’re rare enough that the cost saving is small). The error rate stays accurate (sample-corrected), checkout flow visibility stays full-fidelity, and ingest cost typically drops 40, 60%.
Alert tuning playbook: my alert fires too often, what do I tune?
Three levers in order of usefulness: (a) increase the threshold from 2% to 3% if your baseline noise floor is >1.5%; (b) add a duration clause (“must stay above 2% for 5 minutes”) to suppress flutter; (c) scope the alert to user-facing transactions only (WHERE transactionType = 'Web'). Avoid disabling the alert entirely, an oversensitive alert is fixable, an absent alert costs revenue.