Throughput (req/s), Datadog

Metrics type: Key Metrics • Category: Monitoring

At a glance

The number of requests per second your storefront services are handling. For a merchant, this is “how busy is my site right now?” A sudden drop typically signals one of three things: traffic is gone (acquisition channel down), the site is broken (and shoppers cannot make requests), or measurement is broken (and the requests are happening but not being counted).


API endpoint	Datadog Metrics API, `GET /api/v1/query` with `sum:trace.servlet.request.hits{*}.as_rate()` (or the runtime equivalent).
Metric basis	APM span-count rate. Each request that enters an instrumented service counts as one hit.
Aggregation window	1-minute rollup; the card displays the rolling 5-minute average against the same hour-of-week from the prior 7 days.
Severity threshold	P1 = drop above 50% WoW (likely outage or measurement break); P2 = drop above 30% WoW (alert trigger); P3 = drop above 15% WoW (worth investigating).
Alert pre-filtering	Synthetic test traffic and health-check endpoints are excluded by default to prevent test cadence from masking real traffic dips.
Log Management gating	Not used. Throughput is APM-derived; the card returns valid values regardless of Logs status.
Filtered hosts / services	All instrumented services. For per-service breakdown see Throughput by Service.
Time zone	Account timezone for “same hour of week” comparison; UTC for cross-connector windowing. The day-of-week comparison is critical: comparing 03:00 Sunday against 03:00 Wednesday produces meaningless deltas.
Why “same hour of week” rather than yesterday	Ecommerce traffic is highly cyclical: Tuesday afternoon does not look like Saturday afternoon, and Saturday afternoon does not look like Sunday morning. The same-hour-of-week comparison is the noise-reducing baseline that lets a 30% drop alert mean what it sounds like it means.
Time window	`T/7D vsP` (today vs the same hour of week, 7 days prior)
Alert trigger	`drop >30% WoW`, sustained for 5 minutes pages on-call.
Sentiment key	`throughput`
Roles	owner, engineering, operations

Calculation

Calculated automatically from your Datadog data. See the At a glance summary above for what the metric tracks and the worked example below for a typical reading.

Worked example

A US specialty foods brand on BigCommerce running paid Google Search and Meta ads as the primary acquisition channel. Throughput baselined at 145 req/s on weekday afternoons. On 18 Apr 26 the baseline collapsed to 88 req/s without warning.

Time (UTC)	Throughput (req/s)	WoW delta	What was happening
14:00 (typical Tuesday)	145	baseline	Normal traffic
14:00 (incident Tuesday)	88	-39%	Alert fired
14:15	91	-37%	Investigation begins
14:30	86	-41%	Confirmed not a Datadog issue
14:45	87	-40%	Cause identified
15:30	144	-1%	Recovered

Three things were checked in order, all wrong before they hit the right one:

Site outage? No, the synthetic checkout test was passing, Critical-Path Tests Status was green, and APM error rate was at baseline (0.4%). The site was up and shoppers who reached it were having a normal experience.
Datadog measurement break? No, Reporting Hosts was steady, no agents were marked stale, and APM span ingestion was on schedule. The numbers were real.
Acquisition channel down. Google Ads had paused the merchant’s main ad group at 13:55 because their billing card had expired. No payment, no ads, no traffic. The drop in throughput was a 100% accurate reflection of the drop in paid-traffic clicks. The fix was a 5-minute card update in Google Ads.

Revenue impact:
  - 1.5 hours below baseline at -40% throughput
  - Baseline conversion rate: 1.7%
  - Baseline AOV: $52
  - Lost sessions ≈ (145 − 88) × 60 × 90 ≈ 308,000 sessions
  - Wait, that is sessions not orders. Orders ≈ 308,000 × 0.017 = 5,236
  - That seems high; the right framing: lost ORDERS ≈ baseline_orders/min × 1.5h × 0.40
  - At baseline 5 orders/min × 90 min × 0.40 = 180 lost orders
  - At AOV $52: lost revenue ≈ $9,360

Three takeaways merchants should remember:

A throughput drop is not always a site problem. Roughly half the time it is an acquisition-channel problem: paused ad spend, expired payment method on Google Ads / Meta, an organic-search ranking drop, an email send-list issue. The site is fine; the inbound funnel is throttled. Pair throughput with the relevant acquisition card before assuming a technical incident.
The “same hour of week” baseline is critical. This brand’s Sunday afternoon throughput is only 60 req/s, well below the 145 req/s Tuesday baseline. Comparing Sunday to Tuesday would falsely flag a 60% drop as catastrophic when it is just normal weekly cycling. WoW comparison removes this noise.
Drops below the 30% threshold without obvious cause are usually one of three: ad-spend pause, CDN configuration change (some traffic is now bypassing your APM-instrumented services), or a measurement-side regression. Investigate in that order; the first two cover 80% of cases.

Sibling cards merchants should reference together

Card	Why pair it with Throughput	What the combination tells you
Throughput by Service	The per-service breakdown. When the headline drops, this card identifies which service is starved.	All services dropping together equals upstream/CDN issue; one service dropping equals routing or instrumentation issue.
Error Rate	The classic pairing: throughput down + errors up equals capacity-limited; throughput down + errors flat equals demand-limited (acquisition issue).	Direction of the relationship reveals the cause.
p95 Response Time	Throughput and latency trade off in capacity events; in demand events latency stays flat.	Throughput down + latency up equals overload; throughput down + latency steady equals demand drop.
Active Incidents	When throughput drops without an incident already open, you may need to declare one.	An open incident contextualises the drop; absence of an incident means investigate first.
Critical-Path Tests Status	Synthetic tests run at constant cadence; if real throughput drops but synthetics are green, the site is up but the funnel is starved.	Synthetic green plus real-traffic drop equals demand-side problem (acquisition, marketing, organic).
Google Ads Spend / Click	The most common cause of throughput drops at small-mid merchants.	Sudden ad-spend pause reflects in throughput within 15-30 minutes.
GA4 Sessions	The browser-side peer. GA4 sessions and Datadog throughput should move together.	If GA4 sessions are steady but Datadog throughput dropped, the gap is on the server-routing side.
Shopify / BC / Adobe Total Revenue	The downstream impact metric.	Throughput drop without revenue drop equals bot/cache traffic was the lost portion (less concerning); with revenue drop equals genuine shopper loss.

Reconciling against the vendor’s own dashboard

Where to look in Datadog:

APM → Service List for per-service hits-per-second. Dashboards → APM Overview for the time-series view. APM → Service Map to see how request volume flows across services.

Why our number may legitimately differ from Datadog’s UI:

Reason	Direction	Why
Time zone	Boundary days off	Datadog UI uses account timezone; Vortex IQ uses UTC for cross-connector windowing. The “same hour of week” comparison aligns to UTC by default.
API rate limits	Brief gaps	The Metrics query API is rate-limited; on burst minutes a polled value may use cached prior data.
Log indexing latency	Not applicable	Throughput is APM-derived.
Span sampling	Either direction	Head-based sampling at <100% reduces the absolute count proportionally. The displayed number is multiplied back up using your configured sample rate; if the sample rate setting is wrong, the numbers diverge.

Cross-connector reconciliation:

Card	Expected relationship	What causes the divergence
`google_analytics.ga_sessions`	GA4 sessions per minute should track Datadog throughput within a 1.5x-2x ratio (multiple requests per session).	A widening gap (GA4 stable, Datadog dropping) means traffic is hitting the CDN cache and bypassing instrumented services; the inverse means GA4 tag-fire is broken.
`shopify.total_revenue` / `bigcommerce.total_revenue`	Revenue/min lags throughput by 5-15 minutes during normal cycles.	Throughput drop without revenue drop typically means bot/crawler traffic was the lost portion.
Datadog logs	Subset relationship: log volume scales roughly with throughput at constant verbosity.	If throughput is stable but log volume jumps, you have a runaway-logging regression (often debug logs accidentally left enabled).

Known limitations / merchant FAQs

My throughput dropped but the site looks fine. What is happening? The most common causes, in order: (1) Paid ads paused (billing card expired, daily budget hit, ad account flagged); (2) Organic-search ranking dropped; (3) An email send did not go out on schedule; (4) A CDN configuration change is now caching pages that previously hit your origin; (5) A measurement-side regression (Datadog agent down, instrumentation removed in a deploy). Check acquisition channels first; technical investigation is item 5 of 5, not item 1. Why does the alert use “Week-over-Week” rather than “vs yesterday”? Ecommerce traffic is highly cyclical by day-of-week. Tuesday afternoon is not Saturday afternoon. The same-hour-of-week comparison removes that periodicity so a real 30% drop alert means “this is unusual versus the same time last week” rather than “this is unusual versus the typical Sunday morning”. WoW is the right baseline for retail. My throughput is up 50% but my revenue is flat. Should I be worried? Not necessarily, but investigate. Three benign causes: (1) Bot traffic spike (crawler indexing your sitemap aggressively); (2) A successful but mistargeted ad campaign drove low-intent traffic; (3) Cache miss rate dropped on the CDN, increasing origin throughput without changing real shopper count. One concerning cause: a paid scraper or competitive-intel bot is hammering your product pages. Check Top Slow Endpoints and look for endpoints with abnormally high request counts but low conversion contribution. Datadog says throughput is fine but Google Analytics shows sessions dropping. GA4 measures browser-side sessions; Datadog measures server-side requests. A widening gap (Datadog stable, GA4 dropping) almost always means a tag-fire or consent-banner regression on the browser side, not a real traffic loss. Open GA4 Property Health. If Property Health is amber, the GA4 numbers are unreliable and Datadog throughput is the source of truth. My Logs API returns 400 No valid indexes. Does this card still work? Yes. Throughput is APM-derived. Log Management gating only affects log-volume cards. The Vortex IQ engine logs the gating event once at INFO level and continues serving APM-derived cards normally. What is the difference between Datadog throughput and a load balancer’s request count? Load balancer counts every request that hits the LB; Datadog APM counts every request that reaches an instrumented service. Differences: (1) LB sees CDN-bypass requests Datadog does not (cached responses), (2) LB sees blocked-at-WAF requests Datadog does not (Cloudflare WAF or AWS WAF blocks before APM), (3) Datadog sees worker/job spans the LB does not. For “is shopper traffic arriving”, LB is the truth; for “is shopper traffic being processed by my application”, Datadog is. My throughput chart shows a flat line at zero from 02:00 to 06:00. Is the site down? Almost certainly not. Most ecommerce sites genuinely have near-zero traffic in the small hours of the local timezone, especially regional brands. Datadog is doing its job; the lack of traffic is real. The “low-confidence” flag on the card prevents alert spam during these windows. If your brand has international traffic and you genuinely expect non-zero overnight throughput, raise this in [Settings → Alerts → Per-card thresholds]. Throughput spiked 5x for 30 seconds and then went back to normal. What was that? Three usual causes: (1) A bot wave hit a rate-limit and bounced (you will see a 503 spike in 5xx Response Rate at the same moment), (2) A cache invalidation triggered a thundering-herd on the origin, (3) A scheduled cron or deploy hook fired and made many internal requests. None of these is a customer-facing event. My multi-region store has different throughput per region. How is it shown? The headline is the global sum across all regions. For per-region breakdown, use Uptime by Region (synthetic-test view) or filter the Datadog query by @datacenter: tag. Vortex IQ’s per-region throughput stacked panel is on the roadmap.

Tracked live in Vortex IQ Nerve Centre

Throughput (req/s) is one of hundreds of KPI pulses Vortex IQ tracks across Datadog and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.

Get Started

The AI OS

At a glance

Calculation

Worked example

Sibling cards merchants should reference together

Reconciling against the vendor’s own dashboard

Known limitations / merchant FAQs

Tracked live in Vortex IQ Nerve Centre

​At a glance

​Calculation

​Worked example

​Sibling cards merchants should reference together

​Reconciling against the vendor’s own dashboard

​Known limitations / merchant FAQs

​Tracked live in Vortex IQ Nerve Centre

At a glance

Calculation

Worked example

Sibling cards merchants should reference together

Reconciling against the vendor’s own dashboard

Known limitations / merchant FAQs

Tracked live in Vortex IQ Nerve Centre