Latency on the checkout service overlaid with order volume, when checkout slows, sales follow.
At a glance
A dual-axis chart overlaying checkout-service p95 latency from Datadog with orders-per-minute from the connected commerce sibling (Shopify, BigCommerce, Adobe Commerce). For a merchant, this is the visual answer to “does my checkout service speed actually affect sales?” Spoiler: yes, and this card lets you see the moment latency rises and orders fall.
| API endpoints touched | Datadog Metrics API for p95:trace.servlet.request{service:checkout} (or your configured checkout service name); commerce-sibling order-rate KPI for orders/min. |
| Metric basis | Two time-series, time-aligned to the same 1-minute buckets in UTC: (1) p95 latency in milliseconds for the checkout service, (2) orders per minute from the commerce platform. |
| Aggregation window | 1-minute rollup at source; the card displays a 24-hour window with deploy markers overlaid. |
| Severity threshold | P1 = checkout p95 above 3,000 ms during peak hours (10:00-22:00 in account timezone) sustained for 10 minutes; P2 = above 1,500 ms sustained; P3 = above 800 ms during peak only. |
| Alert pre-filtering | (1) Synthetic test traffic excluded; (2) Health-check endpoints excluded; (3) Non-checkout-service spans excluded by service: tag (engine reads connector configuration for the merchant’s checkout service name). |
| Log Management gating | Not used. The card consumes APM and commerce-sibling KPI; both are independent of Logs. |
| Commerce-sibling required | This card needs a commerce platform connected. Without one, the card displays “Connect a commerce platform to enable this card”. |
| Why dual-axis matters | Latency and orders use different units (ms vs count) but they are causally linked. A single chart with two y-axes is more revealing than two separate charts because it makes the temporal correlation visible at a glance. The eye spots “latency went up, orders went down 10 minutes later” instantly. |
| What “checkout service” means | The Datadog APM service tagged as the merchant’s checkout. Default: any service with service:checkout, service:checkout-service, or service:checkout-api. Configurable in Settings → Datadog → Service mapping. |
| Time zone | Account timezone for chart axes (so peak-hours line up with merchant business hours); UTC for the cross-connector arithmetic. |
| Time window | 24H (rolling 24 hours) |
| Alert trigger | p95 > 3s during peak hours, sustained 10 minutes pages on-call. |
| Roles | owner, engineering, finance |
Calculation
Calculated automatically from your Datadog data. See the At a glance summary above for what the metric tracks and the worked example below for a typical reading.Worked example
A US specialty foods brand on BigCommerce running Datadog APM on the checkout service. 24-hour view captured on 23 Apr 26.| Hour (EST) | Checkout p95 (ms) | Orders/min | Notes |
|---|---|---|---|
| 09:00 | 720 ms | 8 | Morning baseline |
| 11:00 | 680 ms | 11 | Healthy |
| 13:00 | 740 ms | 14 | Lunch peak |
| 14:00 (deploy) | 1,820 ms | 14 | New payment-retry logic shipped; latency creeping |
| 14:30 | 2,800 ms | 12 | Latency above threshold |
| 15:00 | 3,400 ms | 9 | Crossed P1 threshold; orders dropping |
| 15:30 | 3,650 ms | 7 | Sustained slowness; clear order impact |
| 16:00 | 3,520 ms | 6 | Cumulative loss accelerating |
| 16:15 (rollback) | 920 ms | 8 | Rollback initiated |
| 16:45 | 700 ms | 12 | Recovered |
- The lag between latency rising and orders falling is real and measurable. Shoppers do not abandon instantly; they retry, refresh, switch tabs. The lag is typically 5-15 minutes for desktop and 2-8 minutes for mobile. Action: if you see latency rising on this card, the orders/min curve will follow. Do not wait for orders to drop to confirm; latency is the leading indicator.
- The dual-axis visualisation makes the causal link unmistakable. Two separate charts of latency and orders would require the engineering team to mentally overlay them; the dual-axis chart shows the inflection points side-by-side. The first time a merchant sees this card during a real incident, they typically have a “click” moment of “I see the relationship now”.
- Latency above 3,000 ms during peak hours costs more than during off-peak. This brand had latency briefly hit 1,500 ms at 22:00 EST a few weeks earlier with no detectable revenue impact (low traffic, slack capacity). The same latency at 13:00 (peak) cost real money. The card’s peak-hours alert threshold reflects this asymmetry.
Sibling cards merchants should reference together
| Card | Why pair it with Checkout Service Health × Sales | What the combination tells you |
|---|---|---|
| p95 Response Time | The all-services latency view. | Checkout slow + all-services latency flat equals checkout-specific issue; both up equals shared-resource issue. |
| Errors by Endpoint | The complementary breakdown by endpoint. | Identifies which checkout endpoint is the slowest contributor. |
| Database Query Latency p95 | Most common cause of checkout latency regression. | Checkout slow + DB slow equals shared cause; checkout slow + DB OK equals app-side or upstream API issue. |
| Revenue at Risk (live) | The financial reframing while the latency event is open. | Translates “checkout p95 above 3s” into “£X,XXX/hour leaking”. |
| Conversion Drop During Incidents | The post-incident measured loss peer. | Confirms the live observation that orders/min dropped during the latency window. |
| Cart Abandonment During 5xx Spikes | Mechanism: how slowness becomes lost orders. | Latency-driven abandonment (this card) vs error-driven abandonment (5xx card) are different patterns. |
| Critical-Path Tests Status | Synthetic-test view of the same service. | Synthetic green plus this card red equals “real shoppers slower than synthetic shoppers”; usually third-party scripts. |
| Shopify / BC / Adobe Total Revenue | The aggregate downstream impact. | Sustained checkout slowness equals aggregate revenue dip. |
Reconciling against the vendor’s own dashboard
Where to look in Datadog:APM → Service: checkout for the latency time-series of just the checkout service. APM → Service Map filtered to checkout for upstream/downstream visibility. Dashboards → Custom dashboard combining checkout latency with imported commerce data (Shopify/BC/Adobe webhooks landing in Datadog).The orders/min side of this card comes from the connected commerce platform’s KPI; open that platform’s order analytics for the same window. Why our values may legitimately differ from a hand-aligned chart:
| Reason | Direction | Why |
|---|---|---|
| Time zone alignment | Either | The card aligns both axes to UTC for arithmetic; if you compare manually using different timezones, the chart looks shifted. |
| API rate limits | Brief gaps | Both Datadog Metrics API and commerce-sibling Order API are rate-limited; cached values may be 1-2 minutes stale. |
| Log indexing latency | Not applicable | Neither axis uses Logs. |
| Span sampling | APM-side adjustment | If your APM uses head-based sampling at <100%, the latency percentile is computed from sampled spans; the rate is unbiased but absolute counts differ from raw. |
| Order webhook lag | Commerce-sibling lag | Commerce orders are reported via webhook; the most recent 5-15 minutes may be incomplete. |
| Card | Expected relationship | What causes the divergence |
|---|---|---|
shopify.total_revenue / bigcommerce.total_revenue / adobe_commerce.total_revenue | The orders/min source. The card’s right-axis line is computed from this commerce-sibling KPI. | A divergence indicates a webhook backlog or a different revenue-source mix (B2B portal vs storefront vs POS, where the storefront is the only one going through the checkout service). |
google_analytics.ga_sessions | An independent traffic peer. RUM client-side vs APM server-side discrepancy is healthy and expected. | If GA4 sessions are stable but orders/min drop, the lost orders are conversion-side; if GA4 sessions also drop, the issue began upstream of the checkout service. |
stripe.stripe_payment_health_score | Payment-PSP cascade peer. | Checkout latency up plus payment-health down equals payment-processor outage cascading. |
Known limitations / merchant FAQs
Why is this card cross-channel? Because it joins data from two different connectors: Datadog (latency from APM) and the commerce platform (orders/min). The “cross” in cross-channel refers to the type of data, not multiple checkout services. This is one of the most powerful patterns Vortex IQ enables: the cause-and-effect link between technical performance and business outcomes is invisible inside any single tool but obvious when their data is overlaid. My checkout service is named differently, e.g.payment-api or commerce-checkout. Does the card still work?
Yes, but you need to configure the service name. Open Vortex IQ Settings → Datadog → Service mapping and set the merchant’s checkout service name. The default is service:checkout; once you change it, the card reads from the correct service.
My commerce platform is not connected. What does the card show?
The card displays “Connect a commerce platform to enable this card”. This card requires both Datadog (for latency) and a commerce sibling (for orders/min) to function. Without orders/min, the dual-axis is incomplete and the card is hidden.
The latency line is up but orders are stable. What does that mean?
Three possible interpretations: (1) The traffic mix has shifted to lower-conversion sources (paid social brings high traffic but low conversion), so orders/min is artificially low from the upstream side; (2) The latency rise is on a non-conversion-blocking endpoint (e.g. saved-address autocomplete) where shoppers can still complete checkout despite slowness; (3) Shoppers are tolerating the slowness, possibly due to high purchase intent or unique products. Combine with Conversion Drop During Incidents to confirm.
The orders line is up but latency is also high. Is the card wrong?
Two scenarios: (1) Promotional traffic spike where unusually high purchase intent overrides slowness; (2) Cached responses make some checkouts feel fast even when the back-end is slow. The card is right; the relationship between latency and orders is not perfectly linear. Use longer time windows to see the underlying trend.
Why peak hours and not 24/7 alerting?
Because off-peak slowness during overnight or low-traffic hours has minimal revenue impact. A 3,000 ms p95 at 03:00 EST when 2 orders/min are happening costs almost nothing; the same latency at 14:00 EST when 14 orders/min are happening costs significant revenue. Peak-hours-only alerting reduces false-page noise during low-impact windows.
Datadog says checkout is healthy but orders are dropping.
The classic Datadog blind spot. Three causes: (1) The bottleneck is in a third-party widget the APM does not instrument (payment iframe, fraud check, address-validation API); (2) The bottleneck is browser-side (slow JS execution); (3) The bottleneck is upstream (CDN or payment-PSP). Open Page Load p95 (RUM) and Critical-Path Tests Status to see customer-side measurements.
My Logs API returns 400 No valid indexes. Does this card still work?
Yes. This card uses APM (Metrics API) and the commerce-sibling Order KPI; both are independent of Logs.
Why is the card showing 24h and not 7-day?
Because the relationship between latency and orders is most legible at high-resolution windows (1-minute buckets over 24 hours). A 7-day window would compress the temporal pattern too much; the inflection points where latency rises and orders fall would blur. For longer-term trends, use the platform-specific cards individually.
Can I customise the alert threshold per merchant?
Yes. The default 3,000 ms p95 during peak hours is calibrated for typical merchants. Some brands tolerate higher latency (luxury, custom-furniture, B2B); others need lower (high-velocity discount, fast-fashion). Adjust in Settings → Datadog → Alert Thresholds → Checkout Service Health.