New Relic state x commerce-sibling baseline = $/hour at risk while the incident is open. The single most-valuable card in this manifest.
At a glance
The single dollar number that translates technical incident state into a number the COO can read. Multiplies the merchant’s current hourly revenue baseline (from connected commerce sibling, Shopify / BigCommerce / Adobe) by an impact factor derived from live New Relic operational state (Apdex degradation, error rate, active P1s).
| The formula | revenue_at_risk_per_hour = baseline_revenue_per_hour x impact_factor where impact_factor = clamp(0, 1, 0.5 x apdex_drop + 0.3 x error_rate_excess + 0.2 x p1_count_factor). The card is currency-tagged from the commerce sibling and updates every 30s. |
| NerdGraph endpoint | Three NRQL queries via NerdGraph: (1) SELECT apdex(duration, t: 0.5) FROM Transaction SINCE 5 MINUTES AGO for current Apdex; (2) SELECT percentage(count(*), WHERE error IS true) FROM Transaction SINCE 5 MINUTES AGO for error rate; (3) actor.account.aiIssues.issues(filter: {states: [ACTIVATED]}) for active incident count. Plus the commerce sibling endpoint for revenue baseline. |
| Metric basis | Live composite. apdex_drop = max(0, baseline_apdex - current_apdex) where baseline_apdex is the rolling 7-day average for the same time-of-day. error_rate_excess = max(0, current_error_rate - 1.0%). p1_count_factor = min(1, p1_count / 5). |
| Browser vs APM scope | APM-only for the impact factor; Browser RUM is excluded because customer-side latency doesn’t map cleanly to revenue. The commerce-sibling baseline is from the merchant’s actual sales/min, so the dollar number is real. |
| Aggregation window | 5-minute rolling for the impact factor; 1-hour rolling for the baseline. The number flips up the moment an incident degrades operational state, flips down when state returns to baseline. |
| Severity threshold | All severities contribute via the impact factor. P1s carry the strongest weight (5 P1s zero-out the operational health side); P2/P3 affect Apdex / error rate indirectly through their own conditions. |
| Sample basis | Apdex and error-rate inputs are sample-corrected on high-cardinality accounts. P1 count is unsampled. Baseline revenue is unsampled (commerce platform Order data). |
| Filtered hosts / services | APM scope follows the merchant’s appName IN (...) config. Baseline scope is the merchant’s primary commerce connector (Shopify, BigCommerce, or Adobe). |
| Time zone | UTC for live evaluation; account timezone for chart display. |
| Time window | RT |
| Alert trigger | >$0. Any non-zero value triggers a notification because operational degradation that maps to revenue impact is always worth acknowledging. |
| Roles | owner, finance, operations |
Calculation
Calculated automatically from your New Relic data. See the At a glance summary above for what the metric tracks and the worked example below for a typical reading.Worked example
A Shopify Plus merchant with NR APM on the storefront. Baseline revenue at this hour-of-day is £18,400/hour (averaged over the last 28 days, same Tuesday 11:00, 12:00 hour). Live state at 11:14 on 02 May 26:| Input | Value |
|---|---|
| Current Apdex (5-min rolling) | 0.72 |
| Baseline Apdex (7D same hour) | 0.91 |
| Current error rate (5-min rolling) | 4.6% |
| Active P1 incidents | 1 |
apdex_drop = 0.91 - 0.72 = 0.19. Multiplier:0.5 x 0.19 = 0.095(Apdex contribution).error_rate_excess = 4.6% - 1.0% = 3.6%(clamped to a 0, 1 normalised scale: 3.6/10 = 0.36). Multiplier:0.3 x 0.36 = 0.108(error-rate contribution).p1_count_factor = 1 / 5 = 0.2. Multiplier:0.2 x 0.2 = 0.04(P1 contribution).impact_factor = 0.095 + 0.108 + 0.04 = 0.243, ~24% of revenue at risk.
current_apdex >= baseline_apdex - 0.05 AND current_error_rate < 1.0% AND active_P1_count = 0.
Sibling cards merchants should reference together
| Card | Why pair it with Revenue at Risk |
|---|---|
| Operational Health Score | Operational composite. The two cards co-move: score down = revenue at risk up. |
| Active Incidents | One of the three impact-factor inputs. Each open P1 contributes ~£3,680 / hour at this baseline. |
| Error Rate | The 30%-weight component. Largest contributor when error rate climbs above 1%. |
| Apdex Score | The 50%-weight component (largest weight). Apdex drops drive most of the visible movement. |
| Revenue Lost / Min (active incidents) | Cross-channel cousin. This card is potential risk; that one is hard-counted lost revenue. |
| Datadog Revenue at Risk | Cross-connector peer with the same composite shape. |
| Shopify Sales / Min | The baseline source. Watch sales/min co-move during incidents. |
| GA4 Conversion Rate | Customer-side outcome. Conversion drop usually lags risk number by 5, 10 minutes. |
Reconciling against the vendor’s own dashboard
Where to look in New Relic: New Relic does not surface a revenue-at-risk number, this is a Vortex IQ composite that joins NR operational state with commerce-sibling baseline. The closest equivalent screens for the operational-state inputs:- APM > Service > Summary for Apdex and error rate.
- Alerts & AI > Issues & Activity for P1 count.
- Dashboards > “Service overview” pre-built.
| Reason | Direction of divergence |
|---|---|
| Baseline revenue staleness. Baseline is rolling 28-day same-hour-of-day average; if the store had unusually low traffic in the baseline period (a demand-shifting event, a campaign down-day) the baseline reads low. | Risk number understates impact |
| Account timezone vs UTC. Baseline scope follows the commerce platform’s timezone; operational-state queries run in UTC. Boundary-hour rollups can show 5, 10% drift on the live number. | Either direction at hour boundaries |
| NRQL retention windows. Apdex / error rate beyond 8 days aggregates to hourly resolution; the live card uses 5-minute windows so retention isn’t a concern. | None for live card |
| Ingest sampling. Apdex and error-rate inputs are sample-corrected on high-cardinality accounts; the impact factor stays accurate. | None |
| Conservative impact factor calibration. The factor is intentionally tuned to err on the high side (alert-context bias), real revenue impact often lands at 60, 80% of the risk number. | Risk number > actual loss |
dd_revenue_at_risk) using DD’s APM, Monitors, and Synthetics inputs. With both connectors wired, the two risk numbers should agree within ~10% (the gap reflects probe-coverage differences and slightly different impact-factor weights). A 25%+ persistent gap indicates one platform is missing service coverage; audit which services each is instrumenting.
The card is reconciled forward (against actual revenue loss) every 30 minutes after an incident closes: Vortex IQ Mind pulls the actual sales/min trace during the incident window and compares it to the baseline-projected revenue. If the actual loss tracks within 25% of the predicted loss, the model is calibrated; if it drifts persistently, the impact-factor weights are tuned. This back-test runs continuously.
Known limitations / merchant FAQs
NR vs Datadog: should the two revenue-at-risk numbers match? Within ~10%, yes. Both use the same baseline revenue source (the commerce sibling) and the same composite shape, but feed slightly different operational-state inputs (NR APM vs DD APM probes). A 10, 25% gap during an incident is normal and reflects each platform’s coverage. A 25%+ persistent gap means one platform is missing instrumentation on a service that’s contributing to the impact factor. Apdex math: how does Apdex translate to revenue? The card usesapdex_drop = baseline_apdex - current_apdex as a 0, 1 multiplier with 50% weight in the impact factor. So a 0.20 Apdex drop (e.g., 0.91 to 0.71) contributes 0.5 x 0.20 = 0.10 to the impact factor, or 10% of baseline revenue at risk. The 50% weight reflects that Apdex is the strongest single predictor of conversion drop; SOASTA / Akamai 2017 data shows roughly 7% conversion-rate drop per 100ms of additional p95 latency, and Apdex is a satisfaction-weighted view of latency.
NRQL retention: is this card affected by retention?
The live card reads 5-minute windows, well inside any retention window. The 28-day baseline is computed from rolled hourly aggregates and is not affected by raw-event retention. So the card works on standard NR plans (8-day raw retention) as well as on Data Plus (13-month).
NR and Datadog disagree by 30%, who’s right?
Probably both, on different scopes. The two most common causes: (a) coverage difference, NR has the checkout service instrumented, DD has only the storefront, so NR sees more of the incident; (b) sampling difference, one platform’s high-cardinality sampling is dropping events the other keeps. Audit instrumentation parity if a single-number reading across both matters.
Sampling: does sampling break the calculation?
No. Apdex and error-rate inputs are sample-corrected on high-cardinality accounts. P1 count is unsampled. Baseline revenue is unsampled (commerce platform Order data, not event-stream data). The whole composite stays accurate even on heavily-sampled NR accounts.
Multi-account: my US and EU revenue baselines are different, can the card handle both?
Yes. Connect each commerce sibling separately and pair each with the corresponding NR account integration. The Nerve Centre stack panel renders one risk number per regional pair. Combining into a single global number is also supported (sum of regional risk numbers), but most CFOs prefer the regional split for incident triage.
Ingest cost vs visibility tradeoff: can I reduce NR ingest without breaking this card?
Yes. Drop sample rate on non-checkout transactions to 25%, keep checkout at 100%, keep all error events at 100%. The Apdex / error-rate inputs stay sample-corrected, the P1 count is unsampled, baseline revenue is unaffected. The card stays accurate and ingest cost typically drops 40, 60%.
**Alert tuning: my 100/hour” if you want to ignore residual customer-experience drag; (b) add a duration clause (“must be above 0 risk for 5 minutes then back to $50/hour” is rarely worth a notification.
The number jumps to £20k/hour for 30 seconds then back to £0, was that real?
Almost certainly an Applied Intelligence grouping artifact: when an issue family briefly contains 5+ incidents (before AI groups them into one) the P1 count factor can spike. AI typically resolves the grouping within 60, 90s and the number normalises. If the spike persists past 2 minutes it’s a real escalation worth reading. Tune by adding a “must stay above £X for Y minutes” clause to the alert.