Live $/min loss while incidents are open. Stops being academic and starts being the COO’s number.
At a glance
The live, per-minute estimate of revenue being lost while a Datadog incident is open. Where Revenue at Risk shows the hourly rate, this card shows the per-minute ticker, which is what the COO and finance team want to read during a live incident. Every minute the displayed value persists is a minute of cost compounding.
| The formula | revenue_lost_per_min = active_severity_factor × commerce_sibling.revenue_per_min(90D_avg) × estimated_traffic_loss_pct. Same components as Revenue at Risk but expressed at minute resolution rather than hourly. |
| API endpoints touched | Datadog Incidents (/api/v2/incidents?filter[state]=active); commerce-sibling KPI endpoint for 90-day revenue/min. |
| Severity factor | SEV-1 = 1.00; SEV-2 = 0.50; SEV-3 = 0.25; multiple incidents stack additively. |
| Estimated traffic loss percentage | SEV-1 = 35%, SEV-2 = 15%, SEV-3 = 5%. Tunable in Settings → Datadog → Revenue-at-Risk Calibration. |
| Aggregation window | Real-time, refreshed every 60 seconds while incident is open. The displayed number is the current per-minute rate, not a cumulative total. |
| Severity threshold | All severities; SEV-3 is the smallest contributor but stacks with higher severities when multiple incidents are open. |
| Alert pre-filtering | Test incidents ([TEST] titled, or tagged incident_type:test) excluded. |
| Log Management gating | Not used. The card consumes incident state and commerce-sibling baseline; both are independent of Logs. |
| Commerce-sibling required | This card needs a commerce platform connected. Without one, the card displays “Connect a commerce platform to enable this card”. |
| Why per-minute and not per-hour | The live ticker creates urgency. “1,380/hour at risk” even though they are the same number. During a live incident, the per-minute value is the heartbeat that keeps the response sharp. Pair with Revenue at Risk for the hourly view used in executive comms. |
| Time zone | UTC for cross-connector arithmetic; baseline revenue/min uses 90-day rolling average over the same hour-of-week. |
| Time window | RT (real-time, refreshed every 60 seconds). Display window is “while incident is open”. |
| Alert trigger | > $0, the card surfaces any non-zero value as a notification (zero means no incident is open). |
| Roles | owner, finance, operations |
Calculation
Calculated automatically from your Datadog data. See the At a glance summary above for what the metric tracks and the worked example below for a typical reading.Worked example
A UK fashion brand on Shopify with Datadog APM. Baseline revenue at 14:00 GMT (peak): £160/min. A SEV-1 checkout outage opened at 14:23 GMT.- Live cost framing for the response team. “We have a SEV-1” is engineering jargon; “We are losing £56 per minute right now” is finance language. Both teams now share a number. The COO can walk into the engineering Slack channel and ask “are we still losing £56/min?” instead of asking technical questions; the on-call has a clear KPI for “incident is over”.
- The cumulative cost is computed automatically. After 25 minutes the cumulative number reads £1,400. After 90 minutes it reads £5,040. After 4 hours it would be £13,440. The cumulative grows linearly until the incident closes; the per-minute rate is constant unless severity changes (e.g. SEV-1 downgraded to SEV-2 mid-investigation).
- The per-minute frame discourages “let’s wait and see if it self-resolves” thinking. Without this card, the team may be tempted to spend 20 minutes investigating before deciding whether to rollback. With “£56/min leaking” displayed live, the team is more likely to rollback immediately and investigate later. The mental model shifts from “diagnose first” to “stop the bleeding first”.
- Per-minute and per-hour are the same number expressed differently. Use per-minute for live dashboards during an incident; use per-hour for executive briefings, status-page banners, and post-incident summaries. The per-minute figure is the live heartbeat; the per-hour figure is the executive frame.
- The card encourages “stop the bleeding first” decisions. Engineering teams trained on “diagnose first, then fix” can be slow to rollback; the live cost ticker counters this with a clear, ongoing financial argument for immediate action.
- The cumulative tally during a long incident is sobering. A 4-hour SEV-1 at £56/min is £13,440. Many merchants find that one bad incident per quarter costs more than the entire engineering tooling budget for the year. This is the card that justifies investments in deploy safety, automated rollback, and synthetic monitoring.
Sibling cards merchants should reference together
| Card | Why pair it with Revenue Lost / Min | What the combination tells you |
|---|---|---|
| Revenue at Risk (live) | The hourly version of the same number. | Use this card for live dashboards; use Revenue at Risk for executive comms. |
| Active Incidents | The state input for the formula. | Active incidents drives the severity factor that drives this card. |
| Operational Health Score | The composite engineering view. | Composite below 70 plus this card non-zero equals real, measurable, costly incident. |
| Conversion Drop During Incidents | The post-incident measured-loss peer. | Compare live-estimated vs measured to recalibrate the formula. |
| Cart Abandonment During 5xx Spikes | Mechanism: how the revenue gets lost during incidents. | High abandonment plus high per-minute loss equals “incident is converting visitors to bouncers”. |
| Checkout Service Health × Sales | The latency-vs-orders dual-axis. | Confirms the live observation that orders/min dropped during the latency window. |
| Shopify / BC / Adobe Total Revenue | The baseline-input source. | Use this to validate the 90-day baseline the formula uses. |
| GA4 Sessions | The traffic-loss validation source. | If GA4 sessions did not actually drop during the incident, the traffic-loss percentage is over-stated. |
Reconciling against the vendor’s own dashboard
Where to look in Datadog: Datadog does NOT compute or display Revenue Lost / Min; this card is a Vortex IQ-only synthesis. The component inputs come from:Incidents for the active-incident severity (the formula’s state input). Service Catalog for the service the incident affects.The commerce-sibling baseline is fetched from the connected Shopify, BigCommerce, or Adobe Commerce platform via that platform’s Order API. Why our number may legitimately differ from a hand-computed estimate:
| Reason | Direction | Why |
|---|---|---|
| Time zone alignment | Either | The baseline uses the same hour of week in UTC; if you compute by hand using a different timezone alignment, the number shifts. |
| API rate limits | Brief gaps | Both Datadog Incidents API and the commerce-sibling Order API are rate-limited; cached values may be 1-2 minutes stale. |
| Log indexing latency | Not applicable | This card does not consume logs. |
| Severity factor calibration | Either | Default factors (1.00, 0.50, 0.25) are merchant-tunable. |
| Commerce-sibling sync lag | Vortex IQ baseline lower for “today” | The 90-day rolling average lags the most-recent 5-15 minutes of orders not yet acknowledged via webhook. |
| Card | Expected relationship | What causes the divergence |
|---|---|---|
shopify.total_revenue / bigcommerce.total_revenue / adobe_commerce.total_revenue | The baseline source. The hourly baseline equals the commerce-sibling 90-day average for the current hour-of-week, and per-minute equals that divided by 60. | A divergence indicates the commerce-sibling API is returning incomplete data; usually a webhook backlog. |
google_analytics.ga_sessions | Independent traffic-loss validator. | If GA4 sessions did not drop during a SEV-1, the 35% traffic-loss assumption is over-stated and the displayed value is too high. |
stripe.stripe_total_revenue | Cross-validates the commerce-sibling baseline. | A 5-15% gap is normal (refunds, currency); larger gap means one side is mis-syncing. |