Alerts Summary, Datadog

Metrics type: Key Metrics • Category: Monitoring

At a glance

The live count of Datadog monitors currently in Alert or Warn state, broken down by priority. For a merchant, this is “how many things are pinging engineering right now?” Five or more active alerts is unusual; it typically means either a real cascade event or noisy monitors that need tuning. The card surfaces alert volume, NOT whether anything has been declared an incident.


API endpoint	Datadog Monitors API, `GET /api/v1/monitor?with_downtimes=false&group_states=alert,warn`. Returns the full monitor list with state, priority, and last-triggered timestamp.
Metric basis	Monitor state machine: counts monitors whose current state is `Alert` or `Warn`. Excludes `OK`, `No Data` (counted on its own card), and `Skipped`.
Aggregation window	Real-time, refreshed every 60 seconds.
Severity threshold	Datadog’s monitor priority field (P1, P2, P3, P4, P5). The card displays the breakdown but the headline summarises by priority bucket: “1 P1, 3 P2” reads more clearly than “4”.
Alert pre-filtering	(1) Monitors tagged `muted:true` or in scheduled downtime are excluded; (2) Synthetic monitors that test Datadog’s own infrastructure (`@user_agent:Datadog/Synthetic`) are excluded; (3) Monitors created in the last 60 minutes are flagged “new monitor, not yet stable” and shown in a separate bucket so a freshly-misconfigured threshold does not pollute the headline.
Log Management gating	Some monitors are log-based (`type:log alert`); if Log Management is disabled, those monitors persist as `No Data` rather than `Alert`. The Logs API gating returns 400 No valid indexes for log queries; the engine logs once at INFO and skips log-based monitor evaluation. APM, infrastructure, and synthetic monitors continue to function.
Filtered hosts / services	All monitors in the connected Datadog account. To scope to a team, set the connector’s tag scope to `team:your_team`.
Time zone	Datadog account timezone for “last triggered” timestamps; UTC for cross-connector aggregation.
What this card is NOT	This is not the same as Active Incidents. Alerts can fire without an incident being declared, and incidents can exist without active alerts (a customer email surfaced the issue). The two cards together give the full picture.
Time window	`RT` (real-time, refreshed every 60 seconds)
Alert trigger	`> 5 active`, more than 5 simultaneous alerts is the threshold for “something is wrong or your monitors are too noisy”.
Roles	owner, engineering, operations

Calculation

Calculated automatically from your Datadog data. See the At a glance summary above for what the metric tracks and the worked example below for a typical reading.

Worked example

A US apparel brand on Shopify with 47 Datadog monitors covering web, checkout, payment, search, the recommendations service, and infrastructure. Snapshot taken on 25 Apr 26 at 09:15 EST.

Priority	Count	Top monitor titles
P1	1	Checkout p95 above 3s
P2	3	Web error rate above 1.5%, DB connection pool above 85%, recommendations service Apdex below 0.85
P3	2	Container restart on cart-worker, log-volume up 35%
P4	1	Disk space above 75% on db-replica-2
Total	7	(above the 5-alert threshold)

The Vortex IQ dashboard headline reads “7 active alerts (1 P1, 3 P2, 2 P3, 1 P4)” with the P1 visually emphasised. Three things the merchant should read from this:

The P1 is the only one that costs money right now. Checkout p95 above 3s means shoppers experiencing the slow tail are abandoning. Pair with p95 Response Time to see whether the alert reflects a sustained regression or a brief spike.
The three P2s cluster around the same root cause. Web error rate, DB pool, and recommendations Apdex all spiking simultaneously usually means one upstream dependency is degraded (the DB pool exhaustion is causing the other two via shared connections). One fix may resolve all three. This is the “alert correlation” pattern that tells engineering “do not chase three problems, find the one that is causing the rest”.
The two P3s and one P4 are noise during this incident. They are background monitors firing for unrelated reasons; engineering should ignore them while the P1+P2 cluster is being resolved. A common mistake is to triage every alert; the right response is “P1 first, then the cluster of P2s; ignore P3+P4 until the dust settles”.

What this tells the engineering on-call:
  - "1 P1, 3 P2 clustered, 3 background" reads as: "real cascade event, single root cause likely"
  - vs. "4 P1, 4 P2 spread across services" reads as: "multiple unrelated incidents, full-team mobilisation"
  - vs. "0 P1, 0 P2, 12 P3+P4" reads as: "monitors are too noisy, scheduled tuning needed"

Three takeaways merchants should remember:

Alert count alone is meaningless without priority breakdown. Twelve P5 monitors firing is a quiet day; one P1 is a crisis. Always read the priority distribution, not the total.
The 5-alert threshold catches “monitor noise creep”. Healthy Datadog accounts sit at 0-2 active alerts during normal operation. If you are routinely above 5, your monitors are over-tuned (firing on transient noise). Tune monitor thresholds quarterly using Recently Flapped Monitors as the guide.
Active alerts are NOT the same as active incidents. An alert is a metric breach; an incident is a human-declared coordinated response. Many alerts plus zero incidents equals “monitors are firing but engineering has not decided this is a real event yet”. The gap is normal during the first 5-15 minutes of a regression; if it persists past 30 minutes, the monitors are noisy and should be tuned.

Sibling cards merchants should reference together

Card	Why pair it with Alerts Summary	What the combination tells you
Active Incidents	The human-declared peer.	Many alerts plus zero incidents equals “engineering has not decided this is real yet”; many alerts plus active incidents equals coordinated response in progress.
Currently Triggered Monitors	The detail view of the same data.	Alerts Summary is the count; Currently Triggered Monitors is the table with monitor names.
Recently Flapped Monitors (24h)	Identifies monitors that fire and recover repeatedly: typically threshold tuning needed.	High flap count plus high active-alert count equals noisy monitors; low flap count plus stable alerts equals real degradation.
Sustained Threshold Breaches	The “stuck alerts” view: monitors in Alert state for over 30 minutes.	A sustained breach is more concerning than a brief spike; pair to differentiate.
Monitors Without Notification Channel	The silent-failure view: alerts firing but no human is paged.	An active P1 alert that is also in the no-notification list equals “nobody knows it’s broken”. Highest-leverage fix on the dashboard.
Monitor Coverage by Service	The blind-spot view: services without alert coverage.	Zero active alerts on a service is good only if the service has alert coverage; zero alerts plus zero coverage equals “we cannot see this service at all”.
Operational Health Score	The composite view that takes alert volume into context.	Score above 80 with 7 active alerts equals noisy-monitor problem; score below 70 with 7 active alerts equals real cascade.
Top Alerting Services	Pattern view across the last week.	The same service repeatedly in the top equals a chronic problem worth investing in; varied services across days equals normal noise.

Reconciling against the vendor’s own dashboard

Where to look in Datadog:

Monitors → Manage Monitors for the master list with state filters. Monitors → Triggered Monitors filtered by status:Alert OR status:Warn. Monitors → Notifications to confirm which alerts are routing where.

Why our number may legitimately differ from Datadog’s UI:

Reason	Direction	Why
Time zone	Last-triggered timestamps shift	Datadog UI displays in account timezone; Vortex IQ stores UTC.
API rate limits	Brief gaps	The Monitors API is rate-limited; on burst minutes a polled value may use cached prior data.
Log indexing latency	Log-based monitor count lower	Logs API gating returns 400 No valid indexes when Log Management is disabled; log-based monitors persist as `No Data` rather than `Alert`.
Monitor state cache	Up to 60 seconds	Monitor state refreshes once per minute; freshly triggered alerts may take up to 60 seconds to appear.
Mute / downtime exclusion	Vortex IQ count lower	Muted monitors and monitors in scheduled downtime are excluded from the Vortex IQ count by default; Datadog UI shows them with a mute icon.

Cross-connector reconciliation:

Card	Expected relationship	What causes the divergence
Datadog Active Incidents	Alerts can fire without incidents being declared.	A persistent gap (many active alerts, zero incidents) means engineering is treating the alerts as noise. Tune the monitors or declare an incident.
`google_analytics.ga_property_health`	Independent measurement-side health peer.	Active alerts on Datadog plus GA4 Property Health red equals “site is broken AND analytics is broken simultaneously”.
PagerDuty active incidents	Should be 1:1 with Datadog `priority:1` alerts if the integration is configured.	A gap means the PagerDuty-Datadog integration is mis-configured; pages are reaching humans but Datadog is not aware.

Known limitations / merchant FAQs

My alert count is always above 5. Are my monitors broken? Probably yes, in the sense that they need tuning. Healthy Datadog accounts sit at 0-2 active alerts during steady-state operation. If you are routinely above 5, two causes: (1) Thresholds are too tight (firing on transient noise that resolves itself); (2) You have stale monitors for services that no longer exist. Run Recently Flapped Monitors and tune the top 5 by flap count. What is the difference between an active alert and an active incident? An alert (or “monitor in Alert state”) is a metric breach: a number crossed a threshold. An incident is a coordinated response declared by a human (or PagerDuty automation). Many alerts can fire without becoming incidents (the team triages and dismisses); rarely, an incident exists without active alerts (a customer email surfaced the problem). Both cards together give the full picture. My team uses PagerDuty for paging. Why does this card matter? Because PagerDuty pages are downstream of Datadog alerts. If PagerDuty is paging your team but the Vortex IQ Alerts Summary shows zero, your PagerDuty integration is either bypassing Datadog (paging from a different source) or filtering at a different layer. The card is the source-of-truth count of monitors actively alerting in Datadog regardless of downstream routing. Why are P5 alerts even shown? They are not actionable. Two reasons: (1) Some teams use P5 for “info-level” monitors that should never page but should be visible; (2) An accumulation of P5 alerts can indicate a slow-bleed problem (gradual capacity exhaustion, growing log volume, etc) that is worth investigating proactively. The headline emphasises P1+P2 but the breakdown shows all priorities for context. My Logs API returns 400 No valid indexes. Are log-based alerts counted? No. When Log Management is disabled, log-based alerts (Datadog type:log alert) cannot evaluate and remain in No Data state. They are excluded from the active-alerts count. The Vortex IQ engine logs the gating event once at INFO level and continues serving APM, infrastructure, and synthetic alerts normally. To count log-based alerts, enable Log Management on the Datadog Pro tier. Datadog says zero active alerts but customers are complaining about errors. The classic blind spot. Three causes: (1) The relevant monitor does not exist (no alert coverage for the failing service: see Monitor Coverage by Service); (2) The monitor exists but its threshold is too generous (real degradation but not above the bar): tune the threshold; (3) The customer-facing problem is in a code path Datadog is not instrumenting (third-party widget, payment iframe, browser-only). Add Datadog RUM and synthetic browser tests to catch the third case. Why does the alert count fluctuate by 1-2 every minute? Some monitors are “flap-prone”: their underlying metric oscillates around the threshold. Each minute a few flap into Alert and then back to OK. This is normal noise. If a specific monitor is responsible for repeated flapping, it appears in Recently Flapped Monitors (24h); tune that monitor’s threshold or evaluation window to stabilise it. Can I exclude muted alerts from the count? Yes, that is the default. Monitors tagged muted:true or in scheduled downtime are excluded automatically. To include them (for an audit), use the unfiltered query in the Datadog Monitor UI. My multi-team Datadog account has 200+ monitors from teams I do not own. Are those counted? By default, all monitors in the account. To filter by team, set the connector’s tag scope to team:your_team_name. The headline number then reflects only monitors tagged with your team. Most merchants want unfiltered because alerts from any team can affect shopper experience. RUM and Synthetic alerts look different. Are they counted? Yes. The Monitors API is product-agnostic; alerts from RUM, Synthetic, APM, infrastructure, and log-based monitors all count the same. The category breakdown is available in the Datadog UI but not yet in the Vortex IQ headline; planned for a future release.

Tracked live in Vortex IQ Nerve Centre

Alerts Summary is one of hundreds of KPI pulses Vortex IQ tracks across Datadog and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.

Get Started

The AI OS

At a glance

Calculation

Worked example

Sibling cards merchants should reference together

Reconciling against the vendor’s own dashboard

Known limitations / merchant FAQs

Tracked live in Vortex IQ Nerve Centre

​At a glance

​Calculation

​Worked example

​Sibling cards merchants should reference together

​Reconciling against the vendor’s own dashboard

​Known limitations / merchant FAQs

​Tracked live in Vortex IQ Nerve Centre

At a glance

Calculation

Worked example

Sibling cards merchants should reference together

Reconciling against the vendor’s own dashboard

Known limitations / merchant FAQs

Tracked live in Vortex IQ Nerve Centre