Skip to main content
Card class: HeroCategory: Monitoring
Multi-window burn-rate alerting, anything above 14.4× will eat the monthly budget in a day.

At a glance

The rate at which the merchant is consuming their monthly error budget, expressed as a multiple of the sustainable rate. A burn rate of 1× means errors are happening at exactly the rate the SLO permits over the month; 14.4× means at this rate the entire month’s error budget will be consumed in a single day. For a merchant, this is “are we running through our acceptable-error allowance faster than we should?” Above 14.4× is an emergency; it predicts SLO breach within hours.
API endpointDatadog SLO API, GET /api/v1/slo/{slo_id}/history for the time-series, GET /api/v1/slo for the SLO definitions. Burn rate is computed by the engine from the time-series.
Metric basisError budget consumption per hour divided by the steady-state consumption rate the SLO permits. Steady-state for a 99.9% SLO over 30 days is 0.1% errors / 720 hours = 0.00014% / hour. A 1-hour window of 0.001% errors is 7× burn.
Aggregation window1-hour rolling window for the displayed value; multi-window alerting (1h + 5m, 6h + 30m) is configured server-side in Datadog.
Severity thresholdP1 = above 14.4× (will exhaust 30-day budget in 24 hours); P2 = above 6× (alert trigger); P3 = above 3× (worth investigating). The 14.4× number comes from Google SRE: 100% / (24/720 × 100%) = 30, halved for safety = 14.4×.
Alert pre-filteringSynthetic test traffic and health-check endpoints excluded from SLO numerator/denominator at the SLO definition layer (configure in Datadog SLO query).
Log Management gatingNot used. Burn rate is computed from APM and metric data underlying the SLO; the card returns valid values regardless of Logs status.
Why “burn rate” instead of “errors per hour”Burn rate normalises by your specific SLO target. A 99.9% SLO and a 99.99% SLO can both have the same raw error count yet very different burn rates because the budgets are 10x different. Burn rate makes the alert threshold portable across SLOs.
Multi-window alertingDatadog’s recommended pattern: alert on (1h burn > 14.4× AND 5min burn > 14.4×) for fast-burn pages, (6h burn > 6× AND 30min burn > 6×) for slow-burn pages. Vortex IQ surfaces the 1h burn here; the multi-window logic is in Datadog.
Filtered hosts / servicesThe headline displays the highest burn-rate SLO across the account. Per-SLO breakdown lives on the table view.
Time zoneAccount timezone for chart axes; UTC for cross-connector windowing.
Time window1H (rolling 1-hour burn rate)
Alert trigger> 14.4× (fast burn), will exhaust monthly budget in 24 hours; pages on-call.
Rolesowner, engineering

Calculation

Calculated automatically from your Datadog data. See the At a glance summary above for what the metric tracks and the worked example below for a typical reading.

Worked example

A US apparel brand on BigCommerce with two Datadog SLOs:
  • SLO-CHK Checkout availability: 99.9% over 30 days. Budget: 43.2 minutes of unavailability per month.
  • SLO-SRCH Search latency below 800 ms p95: 99.5% over 30 days. Budget: 3.6 hours of breach per month.
Snapshot taken on 28 Apr 26 at 15:00 EST.
SLO30-day targetCurrent 30-dayBudget remaining1h burn rateWhat it means
SLO-CHK99.9%99.94%67% remaining17.2×Will breach in 19 hours at this rate
SLO-SRCH99.5%99.42%12% remaining3.1×Slow-bleed, watch but no urgent action
The headline displays 17.2× with a P1 alert because the highest-burn-rate SLO is in fast-burn territory. Three observations the merchant should read:
  1. The checkout SLO is in fast-burn. A 17.2× burn rate over the next 24 hours would consume 24/720 × 17.2 = 57% of the entire monthly budget. The SLO has 67% remaining now; at 17.2× burn for 24 hours, only 10% would remain by tomorrow. Engineering must act now or risk breach within hours.
  2. The search SLO is in slow-bleed. 3.1× burn rate is concerning but not urgent: a deploy degraded latency slightly two days ago and 12% budget remains. Page on-call but with P3 severity. The team has time to plan a fix during normal hours rather than emergency-page someone.
  3. The two SLOs have very different action paths. Fast-burn (checkout) demands immediate rollback; slow-bleed (search) demands a planned investigation. Conflating them produces poor outcomes; reading them separately produces appropriate response.
Calculation example for SLO-CHK:
  - 30-day target: 99.9%
  - Allowed error rate: 0.1%
  - Allowed errors per hour at steady-state: 0.1% / 720 hours = 0.000139%
  - Observed errors in last hour: 0.00239% (17.2× the steady-state)
  - At this rate, monthly budget consumed in: 720 / 17.2 = 42 hours
  - Already consumed 33% of budget in 8 days (steady-state would be 27%)
  - Days until breach if rate continues: 7-8 days
Three takeaways merchants should remember:
  1. The 14.4× threshold is calibrated, not arbitrary. It comes from the Google SRE workbook: at 14.4×, an SLO over 30 days is consumed in 24 hours, which is the longest-but-still-actionable response window before breach. Below 14.4× = “you have time to plan a fix”; above = “fix now or accept the breach”.
  2. Burn rate makes alert thresholds portable. A 99.9% SLO and a 99.99% SLO can have the same raw error count but very different burn rates because the budgets are 10x different. Engineering teams running multiple SLOs at different targets benefit from this normalisation.
  3. Fast-burn vs slow-bleed need different responses. Fast-burn = “rollback now”; slow-bleed = “schedule a fix this sprint”. Mistaking one for the other produces either over-paging (treating slow-bleed as fast-burn) or under-paging (treating fast-burn as slow-bleed). Multi-window alerting is the standard practice that distinguishes them.

Sibling cards merchants should reference together

CardWhy pair it with SLO Burn RateWhat the combination tells you
Error Budget RemainingThe accumulated counterpart of burn rate.Burn rate plus budget remaining tells you days until breach.
SLO Compliance (current period)The SLO’s current period state.Compliance dropping plus high burn rate equals “actively breaching”; compliance OK plus high burn rate equals “will breach if rate continues”.
Days Until SLO Breach (forecast)The forecast: at this rate, when does the SLO breach?Forecast under 7 days plus high burn rate equals page; forecast 30+ days plus low burn rate equals safe.
Error RateThe driver of burn-rate spikes for error-based SLOs.Error rate spike plus burn rate spike equals “the cause is server-side errors”.
p95 Response TimeThe driver for latency-based SLOs.Latency spike plus burn rate spike on a latency SLO equals “the cause is latency degradation”.
Operational Health ScoreThe composite that includes SLO compliance as a 25%-weight component.Composite drop plus high burn rate equals “the SLO degradation is dragging the composite”.
Active IncidentsA high burn rate without an open incident is the surface to action.Burn rate plus zero incidents equals “engineering has not declared this is real yet”.
Shopify / BC / Adobe Total RevenueThe merchant-impact peer.Sustained high burn on a customer-path SLO typically corresponds to revenue dip.

Reconciling against the vendor’s own dashboard

Where to look in Datadog:
SLO List for the master list with per-SLO burn rate and budget remaining. SLO Detail (any SLO) for the time-series of compliance and burn rate. Monitors → SLO Alert Templates for the multi-window burn-rate alerts.
Why our number may legitimately differ from Datadog’s UI:
ReasonDirectionWhy
Time zonePeriod-boundary effectsSLOs are defined over rolling 30-day windows in account timezone; Vortex IQ uses UTC for cross-connector arithmetic.
API rate limitsBrief gapsThe SLO API is rate-limited; cached values may be 2-5 minutes stale.
Log indexing latencyAffects log-based SLOs onlyIf your SLO query is log-based and Logs is gated, the SLO will read stale data. APM/metric-based SLOs are unaffected.
SLO calculation lag5-15 minutesDatadog computes SLO compliance on a 5-minute schedule; sub-15-minute movements may trail.
Highest-burn aggregationEitherVortex IQ surfaces the highest burn rate across all SLOs as the headline; Datadog UI shows per-SLO views. The numbers match per-SLO; the headline is by design “the worst SLO right now”.
Cross-connector reconciliation:
CardExpected relationshipWhat causes the divergence
Datadog APM error rate / latencyThe driver: SLOs are computed from APM metrics. Burn rate spikes follow APM spikes by 5-15 minutes.A burn-rate spike without an APM spike means the SLO is consuming budget from a different source (log-based SLO, synthetic-based SLO).
shopify.total_revenue / bigcommerce.total_revenue / adobe_commerce.total_revenueSustained high burn on customer-facing SLOs typically corresponds to revenue dip.High burn on internal-service SLOs (worker, batch, admin) does not correspond to revenue dip and is correctly excluded from merchant-impact alerting if you tag those SLOs customer_facing:false.
Stripe / PayPal Payment HealthWhen a payment-PSP outage drives 5xx, both Datadog burn rate and payment-health-score drop.Independent peers both confirming equals high-confidence real incident; only one moving equals investigate one side.

Known limitations / merchant FAQs

I am a non-engineering owner. Why does this card matter to me? Because SLOs are the engineering team’s commitment to your business. A 99.9% checkout-availability SLO is the team saying “we promise checkout will work 99.9% of the time, and if it does not, we will treat that as a failure”. Burn rate is the live signal that the team is at risk of breaking that promise. When burn rate is high, it means the team will need to either fix the underlying problem or break the promise; either has implications for the business. The headline of this card is the merchant-readable “are we at risk of breaking our reliability commitments”. What is an SLO? A Service Level Objective. It is a target like “99.9% of checkout requests succeed” or “p95 search latency below 800 ms” measured over a window (typically 30 days). The SLO defines the line; the error budget is what is left between observed performance and the line; the burn rate is how fast you are consuming the error budget. Why is 14.4× the magic number? At a 14.4× burn rate, you will consume 100% of a 30-day error budget in 24 hours. The math: monthly hours = 720; 24 hours / 720 hours = 3.33% of the period; if you burn 100% of the budget in 3.33% of the time, you are burning at 30× the steady-state rate. Halve for safety = 14.4×. This is the Google SRE-recommended threshold for fast-burn alerting. What is “fast-burn” vs “slow-bleed”? Fast-burn is a high burn rate over a short window (1 hour at 14.4×); slow-bleed is a moderate burn rate over a longer window (6 hours at 6×). Fast-burn requires immediate action (rollback, escalate); slow-bleed requires planned action (schedule a fix this sprint). Multi-window alerting (1h+5min for fast, 6h+30min for slow) is the standard practice. My burn rate is 80×. Should I be panicking? Yes, but tactically. A burn rate of 80× means you will consume the entire 30-day budget in 720 / 80 = 9 hours. If the rate continues, you will breach the SLO before your team’s lunch break. The right response: (1) Identify the underlying cause via Error Rate or p95 Response Time; (2) Decide between rollback, hotfix, or accept-the-breach; (3) If rolling back, do it within 30 minutes; if accepting the breach, schedule the fix and update the post-mortem. Datadog Incident Management is not enabled. Does this card still work? Yes. SLO Burn Rate uses the Datadog SLO API, which is independent of Incident Management. SLOs are available on the Pro tier and above; if you are on free tier, SLOs cannot be created and the card displays “No SLOs configured”. My Logs API returns 400 No valid indexes. Does this affect burn rate? Only for log-based SLOs. APM-based and metric-based SLOs are unaffected. If your team has defined a log-based SLO (count of error logs / total logs), that SLO’s burn rate becomes unreliable when Logs is gated; APM and metric SLOs continue to function normally. Should I have one SLO or many? The Google SRE recommendation: one customer-facing SLO per critical journey (checkout availability, search latency, login success), each tied to revenue impact. Avoid SLOs on internal services unless they directly affect customer experience. Most merchants are well-served by 3-5 SLOs total. Why does the headline always show only the highest burn-rate SLO, not the sum? Because burn rates do not sum meaningfully across SLOs. A 17× burn on checkout and a 3× burn on search are different problems with different priorities; averaging or summing them obscures both. The headline is “the worst SLO right now”, and the per-SLO breakdown lives below in SLO Compliance. My team uses error budget policies. How does Vortex IQ surface those? Error budget policies (e.g. “no risky deploys when budget is below 50%”) are configured at the engineering process layer, not in metrics. Vortex IQ shows the burn rate and budget remaining; whether your team’s deploy-freeze policy kicks in is enforced by your own deploy tooling. Pair this card with Error Budget Remaining to drive policy decisions.

Tracked live in Vortex IQ Nerve Centre

SLO Burn Rate (1h) is one of hundreds of KPI pulses Vortex IQ tracks across Datadog and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.