Active Incidents, Datadog

Metrics type: Key Metrics • Category: Monitoring

At a glance

The count of Datadog incidents currently in active state, broken down by severity. For a merchant, this is “what does my engineering team think is broken right now?” An incident is a coordinated response: someone has acknowledged that something is wrong and is working on it. Open SEV-1 = revenue is leaking; open SEV-2 = revenue is at risk; open SEV-3 = annoyance, not crisis.


API endpoint	Datadog Incidents API, `GET /api/v2/incidents?filter[state]=active`. Returns the full list with severity, title, opened-at, and currently-assigned commander.
Metric basis	Incident-state-machine count, NOT alert/monitor count. An incident is created when a human (or PagerDuty integration) declares one; many alerts may exist without any incident.
Aggregation window	Real-time, refreshed every 60 seconds.
Severity threshold	All severities counted; the headline displays the highest-severity active incident. SEV-1 takes precedence over SEV-2, etc.
Alert pre-filtering	”Test” incidents (any incident with title containing `[TEST]` or tagged `incident_type:test`) are excluded by default. Without this, drill exercises and on-call rotation handovers create noise.
Log Management gating	Not used. Incidents are pulled from the Incident Management API; the card returns valid values regardless of Logs status.
What counts as an “incident”	Any record in Datadog Incident Management with `state IN (active, stable)`. `stable` means “the bleeding has stopped but the post-incident review is open”; both states are surfaced as active for merchant context.
What does NOT count	(1) Resolved incidents, even resolved within the last hour; (2) “Test” incidents from drills; (3) Incidents in archived/deleted state; (4) Monitors in ALERT state that have not been escalated to an incident; (5) PagerDuty pages that did not create a Datadog incident.
Filtered hosts / services	All services in the Datadog account. Multi-team accounts may want to filter to their tag (`team:checkout`); the engine reads the connector’s configured tag scope if set.
Time zone	Account timezone for chart axes; UTC for cross-connector windowing.
Time window	`RT` (real-time, refreshed every 60 seconds)
Alert trigger	`> 0 SEV-1/2`, any open SEV-1 or SEV-2 pages the merchant on-call.
Roles	owner, engineering, operations

Calculation

Calculated automatically from your Datadog data. See the At a glance summary above for what the metric tracks and the worked example below for a typical reading.

Worked example

A UK furniture brand on Adobe Commerce with Datadog monitoring web, checkout, payment, and the warehouse-sync worker. Snapshot taken on 24 Apr 26 at 11:30 GMT.

Incident ID	Severity	Title	Opened	Assigned commander
INC-2841	SEV-1	Checkout 5xx spike, payment-service connection pool exhausted	11:08	[engineering on-call]
INC-2840	SEV-2	Search latency degraded, p95 above 4s on /catalog/search	10:42	[search team lead]
INC-2839	SEV-3	Warehouse-sync delayed, 90-minute backlog on inventory updates	09:15	[operations]

The Vortex IQ dashboard headline displays 3 active incidents with the SEV-1 outlined in red. The merchant’s owner sees three things at a glance:

Checkout is actively broken (SEV-1). This is the revenue-impacting event. Apdex on the checkout service has dropped from 0.94 to 0.71. Revenue at Risk reads £4,200/hour while this incident is open. The owner’s job is not to debug; it is to decide whether to (a) pause paid-media spend until checkout is healthy, (b) post a status-page banner so customers know the issue is acknowledged, (c) email the support inbox to expect higher volume.
Search is degraded (SEV-2). Shoppers can still browse from the homepage and category pages but cannot effectively search. Conversion will dip if it persists; pair with Conversion Drop During Incidents to quantify. SEV-2 typically resolves within 1-3 hours; the owner does not need to act unless it crosses 3 hours.
Warehouse-sync is delayed (SEV-3). This does not affect storefront experience but does mean inventory shown in the storefront may be stale by 90 minutes. Risk: overselling on low-stock items. Rare but real.

Cost framing for the SEV-1 (the only one that immediately costs money):
  - Open since 11:08; current time 11:30. Duration so far: 22 minutes.
  - Revenue/min during incident: £55 (vs baseline £125)
  - Lost revenue/min: £70
  - Cumulative loss so far: 22 × £70 = £1,540
  - If incident runs another 60 minutes: additional £4,200

Three takeaways merchants should remember:

The headline number is meaningful only with severity context. “3 active incidents” sounds the same whether they are 3 SEV-3s or 3 SEV-1s, but the financial impact is 100x different. Always read the severity breakdown, not just the count.
Active incidents != open alerts. A merchant can have 50 active alerts (monitors in ALERT state) without a single active incident, and vice versa. Alerts trip on metrics; incidents are declared by humans. Both matter; this card is the human-confirmed view.
The age of the incident matters as much as the severity. A SEV-1 open for 5 minutes is normal; the team is responding. A SEV-1 open for 90 minutes is a stuck investigation, often because the wrong root cause is being debugged. Pair with Mean Time To Resolve for trending; if SEV-1 ages routinely exceed 60 minutes, the incident-response process needs investment.

Sibling cards merchants should reference together

Card	Why pair it with Active Incidents	What the combination tells you
Alerts Summary	The monitor-level view. Many alerts; few or no incidents.	Many alerts but no incidents equals noisy monitors needing tuning. Many alerts plus an incident equals real coordinated event.
Operational Health Score	The composite that takes incident severity as a 20%-weight component.	One open SEV-1 alone drops the composite below 80.
Revenue at Risk (live)	The financial reframing of an active incident.	Translates “1 SEV-1 open” into “£X,XXX/hour leaking until resolved”.
Mean Time To Resolve	Trended view: how long do incidents typically take to close?	If the active incident’s age exceeds your typical MTTR, the response is stuck and needs help.
Mean Time To Acknowledge	Sub-metric: was the alert acknowledged quickly?	High MTTA combined with active incidents equals on-call rotation problems (paging not reaching the right people).
Top Alerting Services	Pattern view across the last week.	Same service in both lists equals a chronic problem deserving an investment.
Critical-Path Tests Status	Independent confirmation: does the synthetic agree with the human?	Synthetic green plus open SEV-1 = the incident is on a non-customer-facing path; synthetic red plus no incident = customer-facing problem nobody has noticed yet.
Shopify / BC / Adobe Total Revenue	The downstream impact during incident windows.	Always pair an active SEV-1 with revenue/min to size the cost of the outage.

Reconciling against the vendor’s own dashboard

Where to look in Datadog:

Incidents for the master incident list with state filters. Incident Settings to confirm severity definitions and routing rules. Monitors → Triggered Monitors for the related alerts feeding incident creation.

Why our number may legitimately differ from Datadog’s UI:

Reason	Direction	Why
Time zone	”Opened-at” timestamps shift	Datadog UI displays in account timezone; Vortex IQ stores UTC and renders in the merchant’s display timezone (set in Vortex IQ profile).
API rate limits	Brief gaps	The Incidents API is rate-limited at 300 req/h on free tier; on burst minutes a polled value may use cached data.
Log indexing latency	Not applicable	Incidents are not log-derived.
Monitor state cache	Not applicable	This card reads incident state directly, not monitor state.
”Test” filtering	Vortex IQ count lower	Drill / test incidents are excluded by default in Vortex IQ; Datadog UI shows them unless filtered.

Cross-connector reconciliation:

Card	Expected relationship	What causes the divergence
`google_analytics.ga_property_health`	Independent measurement-side health peer.	Active SEV-1 plus GA4 Property Health amber equals “site is broken AND analytics is broken simultaneously”, which is rare but happens during deploy regressions.
`shopify.total_revenue` / `bigcommerce.total_revenue` / `adobe_commerce.total_revenue`	Active SEV-1 typically corresponds to a 10-30% revenue dip while open.	Revenue dip without an open incident equals “the engineering team has not noticed yet”, a high-priority surface.
PagerDuty incidents	Should be 1:1 with Datadog incidents if the integration is configured correctly.	A gap means the PagerDuty-Datadog integration is broken; pages are reaching humans but not creating Datadog records, which makes post-incident reviews harder.

Known limitations / merchant FAQs

My team uses PagerDuty for paging but the Active Incidents card shows zero. Why? PagerDuty pages do not automatically create Datadog incidents. If your incident-management workflow lives entirely in PagerDuty (or Opsgenie, or Slack-only), the Datadog Incidents API will return zero. To populate this card, configure the PagerDuty-Datadog integration to create a Datadog incident for every SEV-1/2 PagerDuty page. Alternatively, switch to Datadog’s native incident management. The card reflects what is in the Datadog Incidents API, not the broader human reality. What is the difference between an alert and an incident? An alert is a single threshold breach: a metric crossed a number for a period. An incident is a coordinated response that a human (or PagerDuty automation) has declared. Many alerts can fire without any incident being declared (the team triages and dismisses); rarely, a real incident exists without alerts (a customer email surfaces a problem the monitors missed). Active Incidents counts only the human-declared events. My SEV-1 has been open for 90 minutes. What should I do? Three things, in order: (1) Confirm someone is actively investigating, an unassigned SEV-1 is the worst case (paged but ignored). (2) Check Mean Time To Resolve for your typical SEV-1 duration; if you are 2x past it, escalate. (3) Decide on the merchant-side mitigations: pause paid-media spend, post a status-page banner, queue a customer-comms email. Those decisions are independent of fixing the technical cause and reduce financial damage. Why are SEV-3 incidents counted at all? They are not customer-facing. SEV-3 incidents typically signal slow-bleed problems: warehouse-sync delays, batch-job overruns, internal-API rate limits. They do not cost shoppers today but, if ignored, become tomorrow’s SEV-1. Counting them keeps the merchant aware that engineering attention is divided. My Datadog account does not have Incident Management enabled. Does this card still work? Datadog Incident Management is included in the Pro tier and above. Free-tier accounts cannot create incidents in Datadog; the card will return zero unless you upgrade. If you use a separate incident-management tool (PagerDuty, Opsgenie, FireHydrant), point that tool’s webhook at Datadog Incidents API to populate the card, or use the Alerts Summary card instead. Datadog says zero incidents but customers are reporting outages. The classic blind spot. Three causes: (1) Monitors are alerting but no human has declared an incident yet (gap between alert and incident creation, often 5-15 minutes); (2) The customer-facing problem is in a code path Datadog is not instrumenting (third-party widget, payment iframe, browser-only); (3) Synthetic tests are passing because they only test the happy-path checkout, not the edge case the customer hit. Run Critical-Path Tests Status and JS Errors / Session for the customer-side view. Does the card include incidents from non-production environments? By default, no, the engine filters by env:production (or the connector’s configured production tag). If your team declares incidents in staging for drill purposes, those are excluded. To include staging incidents, change the connector’s environment scope in Settings → Datadog → Environment filter. My multi-team Datadog account has incidents from teams I do not own. Are those counted? By default, all incidents in the account. To filter by team, set the connector’s tag scope to team:your_team_name. The headline number then reflects only incidents tagged with your team. Most merchants want the unfiltered view because a SEV-1 from any team can affect shopper experience. RUM and Synthetic incidents look different from APM incidents. Are they counted? Yes. Datadog Incident Management is product-agnostic; an incident declared from a RUM alert, Synthetic alert, APM alert, or manually counts the same. The category breakdown is available in the Datadog UI but not yet in the Vortex IQ headline; planned for a future release.

Tracked live in Vortex IQ Nerve Centre

Active Incidents is one of hundreds of KPI pulses Vortex IQ tracks across Datadog and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.

Get Started

The AI OS

At a glance

Calculation

Worked example

Sibling cards merchants should reference together

Reconciling against the vendor’s own dashboard

Known limitations / merchant FAQs

Tracked live in Vortex IQ Nerve Centre

​At a glance

​Calculation

​Worked example

​Sibling cards merchants should reference together

​Reconciling against the vendor’s own dashboard

​Known limitations / merchant FAQs

​Tracked live in Vortex IQ Nerve Centre

At a glance

Calculation

Worked example

Sibling cards merchants should reference together

Reconciling against the vendor’s own dashboard

Known limitations / merchant FAQs

Tracked live in Vortex IQ Nerve Centre