At a glance
The count of Datadog incidents currently in active state, broken down by severity. For a merchant, this is “what does my engineering team think is broken right now?” An incident is a coordinated response: someone has acknowledged that something is wrong and is working on it. Open SEV-1 = revenue is leaking; open SEV-2 = revenue is at risk; open SEV-3 = annoyance, not crisis.
| API endpoint | Datadog Incidents API, GET /api/v2/incidents?filter[state]=active. Returns the full list with severity, title, opened-at, and currently-assigned commander. |
| Metric basis | Incident-state-machine count, NOT alert/monitor count. An incident is created when a human (or PagerDuty integration) declares one; many alerts may exist without any incident. |
| Aggregation window | Real-time, refreshed every 60 seconds. |
| Severity threshold | All severities counted; the headline displays the highest-severity active incident. SEV-1 takes precedence over SEV-2, etc. |
| Alert pre-filtering | ”Test” incidents (any incident with title containing [TEST] or tagged incident_type:test) are excluded by default. Without this, drill exercises and on-call rotation handovers create noise. |
| Log Management gating | Not used. Incidents are pulled from the Incident Management API; the card returns valid values regardless of Logs status. |
| What counts as an “incident” | Any record in Datadog Incident Management with state IN (active, stable). stable means “the bleeding has stopped but the post-incident review is open”; both states are surfaced as active for merchant context. |
| What does NOT count | (1) Resolved incidents, even resolved within the last hour; (2) “Test” incidents from drills; (3) Incidents in archived/deleted state; (4) Monitors in ALERT state that have not been escalated to an incident; (5) PagerDuty pages that did not create a Datadog incident. |
| Filtered hosts / services | All services in the Datadog account. Multi-team accounts may want to filter to their tag (team:checkout); the engine reads the connector’s configured tag scope if set. |
| Time zone | Account timezone for chart axes; UTC for cross-connector windowing. |
| Time window | RT (real-time, refreshed every 60 seconds) |
| Alert trigger | > 0 SEV-1/2, any open SEV-1 or SEV-2 pages the merchant on-call. |
| Roles | owner, engineering, operations |
Calculation
Calculated automatically from your Datadog data. See the At a glance summary above for what the metric tracks and the worked example below for a typical reading.Worked example
A UK furniture brand on Adobe Commerce with Datadog monitoring web, checkout, payment, and the warehouse-sync worker. Snapshot taken on 24 Apr 26 at 11:30 GMT.| Incident ID | Severity | Title | Opened | Assigned commander |
|---|---|---|---|---|
| INC-2841 | SEV-1 | Checkout 5xx spike, payment-service connection pool exhausted | 11:08 | [engineering on-call] |
| INC-2840 | SEV-2 | Search latency degraded, p95 above 4s on /catalog/search | 10:42 | [search team lead] |
| INC-2839 | SEV-3 | Warehouse-sync delayed, 90-minute backlog on inventory updates | 09:15 | [operations] |
- Checkout is actively broken (SEV-1). This is the revenue-impacting event. Apdex on the checkout service has dropped from 0.94 to 0.71. Revenue at Risk reads £4,200/hour while this incident is open. The owner’s job is not to debug; it is to decide whether to (a) pause paid-media spend until checkout is healthy, (b) post a status-page banner so customers know the issue is acknowledged, (c) email the support inbox to expect higher volume.
- Search is degraded (SEV-2). Shoppers can still browse from the homepage and category pages but cannot effectively search. Conversion will dip if it persists; pair with Conversion Drop During Incidents to quantify. SEV-2 typically resolves within 1-3 hours; the owner does not need to act unless it crosses 3 hours.
- Warehouse-sync is delayed (SEV-3). This does not affect storefront experience but does mean inventory shown in the storefront may be stale by 90 minutes. Risk: overselling on low-stock items. Rare but real.
- The headline number is meaningful only with severity context. “3 active incidents” sounds the same whether they are 3 SEV-3s or 3 SEV-1s, but the financial impact is 100x different. Always read the severity breakdown, not just the count.
- Active incidents != open alerts. A merchant can have 50 active alerts (monitors in ALERT state) without a single active incident, and vice versa. Alerts trip on metrics; incidents are declared by humans. Both matter; this card is the human-confirmed view.
- The age of the incident matters as much as the severity. A SEV-1 open for 5 minutes is normal; the team is responding. A SEV-1 open for 90 minutes is a stuck investigation, often because the wrong root cause is being debugged. Pair with Mean Time To Resolve for trending; if SEV-1 ages routinely exceed 60 minutes, the incident-response process needs investment.
Sibling cards merchants should reference together
| Card | Why pair it with Active Incidents | What the combination tells you |
|---|---|---|
| Alerts Summary | The monitor-level view. Many alerts; few or no incidents. | Many alerts but no incidents equals noisy monitors needing tuning. Many alerts plus an incident equals real coordinated event. |
| Operational Health Score | The composite that takes incident severity as a 20%-weight component. | One open SEV-1 alone drops the composite below 80. |
| Revenue at Risk (live) | The financial reframing of an active incident. | Translates “1 SEV-1 open” into “£X,XXX/hour leaking until resolved”. |
| Mean Time To Resolve | Trended view: how long do incidents typically take to close? | If the active incident’s age exceeds your typical MTTR, the response is stuck and needs help. |
| Mean Time To Acknowledge | Sub-metric: was the alert acknowledged quickly? | High MTTA combined with active incidents equals on-call rotation problems (paging not reaching the right people). |
| Top Alerting Services | Pattern view across the last week. | Same service in both lists equals a chronic problem deserving an investment. |
| Critical-Path Tests Status | Independent confirmation: does the synthetic agree with the human? | Synthetic green plus open SEV-1 = the incident is on a non-customer-facing path; synthetic red plus no incident = customer-facing problem nobody has noticed yet. |
| Shopify / BC / Adobe Total Revenue | The downstream impact during incident windows. | Always pair an active SEV-1 with revenue/min to size the cost of the outage. |
Reconciling against the vendor’s own dashboard
Where to look in Datadog:Incidents for the master incident list with state filters. Incident Settings to confirm severity definitions and routing rules. Monitors → Triggered Monitors for the related alerts feeding incident creation.Why our number may legitimately differ from Datadog’s UI:
| Reason | Direction | Why |
|---|---|---|
| Time zone | ”Opened-at” timestamps shift | Datadog UI displays in account timezone; Vortex IQ stores UTC and renders in the merchant’s display timezone (set in Vortex IQ profile). |
| API rate limits | Brief gaps | The Incidents API is rate-limited at 300 req/h on free tier; on burst minutes a polled value may use cached data. |
| Log indexing latency | Not applicable | Incidents are not log-derived. |
| Monitor state cache | Not applicable | This card reads incident state directly, not monitor state. |
| ”Test” filtering | Vortex IQ count lower | Drill / test incidents are excluded by default in Vortex IQ; Datadog UI shows them unless filtered. |
| Card | Expected relationship | What causes the divergence |
|---|---|---|
google_analytics.ga_property_health | Independent measurement-side health peer. | Active SEV-1 plus GA4 Property Health amber equals “site is broken AND analytics is broken simultaneously”, which is rare but happens during deploy regressions. |
shopify.total_revenue / bigcommerce.total_revenue / adobe_commerce.total_revenue | Active SEV-1 typically corresponds to a 10-30% revenue dip while open. | Revenue dip without an open incident equals “the engineering team has not noticed yet”, a high-priority surface. |
| PagerDuty incidents | Should be 1:1 with Datadog incidents if the integration is configured correctly. | A gap means the PagerDuty-Datadog integration is broken; pages are reaching humans but not creating Datadog records, which makes post-incident reviews harder. |
Known limitations / merchant FAQs
My team uses PagerDuty for paging but the Active Incidents card shows zero. Why? PagerDuty pages do not automatically create Datadog incidents. If your incident-management workflow lives entirely in PagerDuty (or Opsgenie, or Slack-only), the Datadog Incidents API will return zero. To populate this card, configure the PagerDuty-Datadog integration to create a Datadog incident for every SEV-1/2 PagerDuty page. Alternatively, switch to Datadog’s native incident management. The card reflects what is in the Datadog Incidents API, not the broader human reality. What is the difference between an alert and an incident? An alert is a single threshold breach: a metric crossed a number for a period. An incident is a coordinated response that a human (or PagerDuty automation) has declared. Many alerts can fire without any incident being declared (the team triages and dismisses); rarely, a real incident exists without alerts (a customer email surfaces a problem the monitors missed). Active Incidents counts only the human-declared events. My SEV-1 has been open for 90 minutes. What should I do? Three things, in order: (1) Confirm someone is actively investigating, an unassigned SEV-1 is the worst case (paged but ignored). (2) Check Mean Time To Resolve for your typical SEV-1 duration; if you are 2x past it, escalate. (3) Decide on the merchant-side mitigations: pause paid-media spend, post a status-page banner, queue a customer-comms email. Those decisions are independent of fixing the technical cause and reduce financial damage. Why are SEV-3 incidents counted at all? They are not customer-facing. SEV-3 incidents typically signal slow-bleed problems: warehouse-sync delays, batch-job overruns, internal-API rate limits. They do not cost shoppers today but, if ignored, become tomorrow’s SEV-1. Counting them keeps the merchant aware that engineering attention is divided. My Datadog account does not have Incident Management enabled. Does this card still work? Datadog Incident Management is included in the Pro tier and above. Free-tier accounts cannot create incidents in Datadog; the card will return zero unless you upgrade. If you use a separate incident-management tool (PagerDuty, Opsgenie, FireHydrant), point that tool’s webhook at Datadog Incidents API to populate the card, or use the Alerts Summary card instead. Datadog says zero incidents but customers are reporting outages. The classic blind spot. Three causes: (1) Monitors are alerting but no human has declared an incident yet (gap between alert and incident creation, often 5-15 minutes); (2) The customer-facing problem is in a code path Datadog is not instrumenting (third-party widget, payment iframe, browser-only); (3) Synthetic tests are passing because they only test the happy-path checkout, not the edge case the customer hit. Run Critical-Path Tests Status and JS Errors / Session for the customer-side view. Does the card include incidents from non-production environments? By default, no, the engine filters byenv:production (or the connector’s configured production tag). If your team declares incidents in staging for drill purposes, those are excluded. To include staging incidents, change the connector’s environment scope in Settings → Datadog → Environment filter.
My multi-team Datadog account has incidents from teams I do not own. Are those counted?
By default, all incidents in the account. To filter by team, set the connector’s tag scope to team:your_team_name. The headline number then reflects only incidents tagged with your team. Most merchants want the unfiltered view because a SEV-1 from any team can affect shopper experience.
RUM and Synthetic incidents look different from APM incidents. Are they counted?
Yes. Datadog Incident Management is product-agnostic; an incident declared from a RUM alert, Synthetic alert, APM alert, or manually counts the same. The category breakdown is available in the Datadog UI but not yet in the Vortex IQ headline; planned for a future release.