Error Rate, New Relic

Metrics type: Key Metrics • Category: Monitoring

At a glance

Percentage of Transaction events that ended in an error in the rolling window. The fastest leading-edge signal of a deploy gone wrong, an upstream dependency crashing, or a database hitting connection-pool exhaustion. The card a duty engineer should pin to a second monitor.


What it counts	`errorCount / count(*) FROM Transaction x 100`, expressed as a percentage. An error is any `Transaction` event with `error IS true` (set by the APM agent on uncaught exceptions, 5xx responses, and explicit `noticeError()` calls).
NerdGraph endpoint	NRQL via NerdGraph: `SELECT percentage(count(*), WHERE error IS true) FROM Transaction WHERE appName IN (...) SINCE 5 MINUTES AGO`.
Metric basis	Event-rate (not request-rate). Async background work, scheduled jobs, and message-consumer transactions all contribute equally to a foreground HTTP transaction. To restrict to user-facing only, filter `WHERE transactionType = 'Web'`.
Aggregation window	5-minute rolling for the live KPI; 1-hour rollup on the trend chart; 1-day rollup for the 7D vsP comparison.
Severity threshold	All errors counted equally by default. To prioritise P1-only paths, scope the NRQL to `WHERE name LIKE 'Controller/Checkout/%'` or your equivalent.
Browser vs APM scope	This card is APM-only (server-side). For client-side errors see JS Errors / Session which reads `JavaScriptError` events from Browser. APM error rate of 0.5% with Browser JS errors at 8% means the issue is client-side (a third-party script, a CDN edge, or a CORS rule).
Filtered hosts / services	`appName IN (...)` is set per-merchant during onboarding to the apps that matter (typically the storefront app, checkout app, and payment service). Background workers and admin tools are excluded by default.
Sample basis	NRQL on Transaction is sample-corrected for accounts in event sampling. The percentage stays accurate; the absolute error count may be lower than what server logs show.
Time zone	Account-configured timezone for chart axes; UTC for raw event timestamps.
Time window	`T/7D vsP` (today vs prior 7-day average for the same time-of-day)
Alert trigger	`>2%`, calibrated to “above the normal day-to-day noise floor of 0.5, 1% on a healthy storefront”. Tune up to 3% for B2B integrations where partner-side timeouts inflate the baseline.
Sentiment key	`error_rate`
Roles	owner, engineering, operations

Calculation

Calculated automatically from your New Relic data. See the At a glance summary above for what the metric tracks and the worked example below for a typical reading.

Worked example

A BigCommerce storefront on Cloud Run is showing this card going from 0.6% to 4.1% between 14:05 and 14:20 on 02 May 26. The duty engineer pulls this card up first.

Time	NRQL result	Interpretation
14:00, 14:05	0.6% (60 errors / 9,800 txns)	Normal noise floor, no action
14:05, 14:10	1.4% (155 errors / 11,200 txns)	Above noise floor, watch for sustained breach
14:10, 14:15	2.8% (348 errors / 12,400 txns)	Alert fires (over 2% threshold)
14:15, 14:20	4.1% (525 errors / 12,800 txns)	Sustained, escalate

What’s actually happening: at 14:08 a deploy went out to the catalogue service. The new build introduced a regression in the product-detail endpoint that returns 503 when SKU has more than 12 variants. About 8% of products meet that condition, so roughly 8% of product-detail page loads now fail. Conversion impact translation. With 12,800 transactions / 5 min and a 4.1% error rate, ~525 customers in 5 minutes hit a broken page. At an average product-detail-to-cart conversion of 12% and an AOV of £85, that’s 525 x 0.12 x £85 = £5,355 of cart-add value lost per 5-minute window, or roughly £1,070/min of risked GMV. If the rollback takes 25 minutes from alert to deploy, the total exposure is ~£26.7k for a single regression. The Apdex calibration interaction matters here: with t = 0.5s, errors that take 8s to surface (the 503 timeout) push customers into the frustrated bucket (4 x t = 2s, anything above is frustrated). So Apdex drops from 0.91 to 0.74 at the same time the error rate climbs. The composite Operational Health Score drops from 87 to 68, breaching its 70 alert threshold roughly 2 minutes before the error rate alone would. If the card stayed at 0.6%, the conversation would be different: 0.6% of 12,800 is 77 errors in 5 minutes. Most are background-worker timeouts and 404 from bot crawlers; the customer-facing impact is negligible. Same number, different story, hence the rule “look at trend not absolute level”.

Sibling cards merchants should reference together

Card	Why pair it with Error Rate
5xx Response Rate	Subset view: only HTTP 5xx errors. When error rate spikes but 5xx rate doesn’t, the cause is application-level exceptions, not infrastructure.
Errors by Transaction	The drill-down. Tells you which endpoint is failing. Open this immediately when error rate breaches threshold.
Top Error Classes	Groups by exception class so you can see whether one root cause is driving most of the volume.
New Error Types (last 24h)	Detects errors that didn’t exist yesterday, the strongest deploy-regression signal.
Apdex Score	Companion latency-and-satisfaction view. Error rate up + Apdex down = customer-facing problem; error rate up + Apdex stable = background-only problem.
Datadog Error Rate	Cross-connector peer. The two should agree within 0.3% during normal periods; gaps over 1% point at probe coverage drift.
GA4 Conversion Rate	Customer-side outcome. Sustained 3%+ error rate on commerce paths typically drops conversion 15, 30%.
Shopify Sales / Min	Revenue-side outcome. Watch this co-move with error rate during incidents to quantify revenue at risk.

Reconciling against the vendor’s own dashboard

Where to look in New Relic:

APM > Application > Error analytics for the same NRQL chart in New Relic’s native UI.
Dashboards > pre-built “Error analytics” dashboard.
Alerts & AI > Conditions to see the alert configuration that backs this card’s threshold.

The chart in NR’s APM > Errors page should match Vortex IQ to 4 decimal places when both are on the same time-window. If they differ by more than 0.5% on the same window, check the timezone and appName filters. Why our number may legitimately differ from New Relic’s own screens:

Reason	Direction of divergence
Account timezone vs UTC. NR APM chart axes follow the account timezone; Vortex IQ NRQL runs in UTC. Boundary-period rollups can differ by 0.1, 0.3% on the most recent hour slice.	Either direction near hour boundaries
NRQL retention windows. Full-resolution `Transaction` events are retained 8 days on standard plans, 13 months on Data Plus. Beyond the retention window, queries return aggregated data with slightly coarser percentage rounding.	Vortex IQ may show a 0.05, 0.1% drift on >7-day windows
Ingest sampling. NR samples `Transaction` events on high-cardinality accounts; the percentage stays correct (sample-corrected) but the count differs from raw server logs.	Counts in Vortex IQ < server logs; rates match
NerdGraph rate limits. Default 3,000 NRQL points / minute / account. Heavy investigation can stale the card by 30, 60s.	Stale, not wrong
Filter scope. Vortex IQ scopes `appName` to merchant-relevant apps; NR’s all-apps default chart includes background workers and admin tools, which can read at 5, 10x the merchant-facing rate without it being a customer-facing problem.	Vortex IQ < NR raw

Cross-connector reconciliation: NR APM and Datadog APM instrument differently (NR APM agent vs DD trace agent). On the same Express.js or Rails app, the two error-rate numbers should agree within 0.3% during steady state. A 1%+ persistent gap usually means one platform is missing instrumentation on a sub-route (common with framework middleware order changes) or one has a different error-classification rule (NR counts noticeError() calls, DD counts unhandled exceptions plus configured 5xx).

Known limitations / merchant FAQs

New Relic and Datadog disagree on error rate by 1.2%, who’s right? Probably both, on different scopes. The most common cause is one platform instrumenting a route the other doesn’t (a server-sent-events endpoint, a websocket handler, a framework error filter that swallows exceptions before NR sees them). Pick a 5-minute window where the gap is largest and run both queries with the same appName filter; if the gap persists, audit each agent’s coverage. A persistent 1%+ gap on otherwise identical scope is rare and worth a support ticket on whichever platform looks anomalous. Why is my error rate 0% but my customers report errors? Three usual causes: (a) the errors are client-side JavaScript, not server-side, see JS Errors / Session; (b) the application is catching exceptions and not calling noticeError(), so APM never sees them, audit the framework’s error middleware; (c) the failure is a 4xx (validation, auth) which NR’s default config classifies as “expected” rather than “error”. Add error_collector.expected_classes overrides if 4xx represents real customer pain. My error rate jumps every Sunday at 02:00, is something broken? Almost certainly a scheduled background job. Check whether your APM scope includes transactionType = 'Other' (NR’s classification for non-web work). If yes, weekly batch jobs (DB cleanup, report generation) can produce concentrated error volume that doesn’t affect customers. Restrict the card to WHERE transactionType = 'Web' to filter background work out. Apdex math: how do errors interact with Apdex? Errors automatically count as frustrated in Apdex calculation regardless of duration. So a 200ms error is treated identically to an 8s timeout in the Apdex bucket assignment. This is why a sudden error-rate spike often drops Apdex below 0.5 even when latency p95 looks unchanged: the frustrated bucket fills with fast-failing requests. NR’s Apdex doc covers the rule. NRQL retention is 8 days but my error trend chart shows 30 days, how? NR rolls up Transaction data into hourly aggregates after the full-resolution retention window. The 30-day trend chart you see uses the rolled aggregates beyond day 8. Counts and rates remain accurate; what you lose is the ability to drill into a specific minute past day 8. Sampling: am I missing real errors? On accounts in event sampling, NR keeps a representative subset. Rates are sample-corrected so the percentage you see is accurate; the raw error count may be lower than what your server logs show. If you need raw counts (e.g., for billing reconciliation), pair this card with log-side counts from Error-level Log Rate, which is unaffected by event sampling. Multi-account: I have a US and EU NR account, can I see one combined number? Vortex IQ reads one NR account per integration. To see the combined number, connect both accounts as separate integrations and stack them in the Nerve Centre (the stack panels feature averages by transaction count, not arithmetic average, so the combined rate is correctly weighted). Ingest cost vs visibility tradeoff: is there a way to keep accuracy and reduce cost? Yes. Drop sample rate on non-checkout transactions to 25%, keep checkout at 100%, and keep all error events at 100% (errors should never be sampled, they’re rare enough that the cost saving is small). The error rate stays accurate (sample-corrected), checkout flow visibility stays full-fidelity, and ingest cost typically drops 40, 60%. Alert tuning playbook: my alert fires too often, what do I tune? Three levers in order of usefulness: (a) increase the threshold from 2% to 3% if your baseline noise floor is >1.5%; (b) add a duration clause (“must stay above 2% for 5 minutes”) to suppress flutter; (c) scope the alert to user-facing transactions only (WHERE transactionType = 'Web'). Avoid disabling the alert entirely, an oversensitive alert is fixable, an absent alert costs revenue.

Tracked live in Vortex IQ Nerve Centre

Error Rate is one of hundreds of KPI pulses Vortex IQ tracks across New Relic and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.

Get Started

The AI OS

At a glance

Calculation

Worked example

Sibling cards merchants should reference together

Reconciling against the vendor’s own dashboard

Known limitations / merchant FAQs

Tracked live in Vortex IQ Nerve Centre

​At a glance

​Calculation

​Worked example

​Sibling cards merchants should reference together

​Reconciling against the vendor’s own dashboard

​Known limitations / merchant FAQs

​Tracked live in Vortex IQ Nerve Centre

At a glance

Calculation

Worked example

Sibling cards merchants should reference together

Reconciling against the vendor’s own dashboard

Known limitations / merchant FAQs

Tracked live in Vortex IQ Nerve Centre