Skip to main content
Card class: HeroCategory: Monitoring

At a glance

The live count of Datadog monitors currently in Alert or Warn state, broken down by priority. For a merchant, this is “how many things are pinging engineering right now?” Five or more active alerts is unusual; it typically means either a real cascade event or noisy monitors that need tuning. The card surfaces alert volume, NOT whether anything has been declared an incident.
API endpointDatadog Monitors API, GET /api/v1/monitor?with_downtimes=false&group_states=alert,warn. Returns the full monitor list with state, priority, and last-triggered timestamp.
Metric basisMonitor state machine: counts monitors whose current state is Alert or Warn. Excludes OK, No Data (counted on its own card), and Skipped.
Aggregation windowReal-time, refreshed every 60 seconds.
Severity thresholdDatadog’s monitor priority field (P1, P2, P3, P4, P5). The card displays the breakdown but the headline summarises by priority bucket: “1 P1, 3 P2” reads more clearly than “4”.
Alert pre-filtering(1) Monitors tagged muted:true or in scheduled downtime are excluded; (2) Synthetic monitors that test Datadog’s own infrastructure (@user_agent:Datadog/Synthetic) are excluded; (3) Monitors created in the last 60 minutes are flagged “new monitor, not yet stable” and shown in a separate bucket so a freshly-misconfigured threshold does not pollute the headline.
Log Management gatingSome monitors are log-based (type:log alert); if Log Management is disabled, those monitors persist as No Data rather than Alert. The Logs API gating returns 400 No valid indexes for log queries; the engine logs once at INFO and skips log-based monitor evaluation. APM, infrastructure, and synthetic monitors continue to function.
Filtered hosts / servicesAll monitors in the connected Datadog account. To scope to a team, set the connector’s tag scope to team:your_team.
Time zoneDatadog account timezone for “last triggered” timestamps; UTC for cross-connector aggregation.
What this card is NOTThis is not the same as Active Incidents. Alerts can fire without an incident being declared, and incidents can exist without active alerts (a customer email surfaced the issue). The two cards together give the full picture.
Time windowRT (real-time, refreshed every 60 seconds)
Alert trigger> 5 active, more than 5 simultaneous alerts is the threshold for “something is wrong or your monitors are too noisy”.
Rolesowner, engineering, operations

Calculation

Calculated automatically from your Datadog data. See the At a glance summary above for what the metric tracks and the worked example below for a typical reading.

Worked example

A US apparel brand on Shopify with 47 Datadog monitors covering web, checkout, payment, search, the recommendations service, and infrastructure. Snapshot taken on 25 Apr 26 at 09:15 EST.
PriorityCountTop monitor titles
P11Checkout p95 above 3s
P23Web error rate above 1.5%, DB connection pool above 85%, recommendations service Apdex below 0.85
P32Container restart on cart-worker, log-volume up 35%
P41Disk space above 75% on db-replica-2
Total7(above the 5-alert threshold)
The Vortex IQ dashboard headline reads “7 active alerts (1 P1, 3 P2, 2 P3, 1 P4)” with the P1 visually emphasised. Three things the merchant should read from this:
  1. The P1 is the only one that costs money right now. Checkout p95 above 3s means shoppers experiencing the slow tail are abandoning. Pair with p95 Response Time to see whether the alert reflects a sustained regression or a brief spike.
  2. The three P2s cluster around the same root cause. Web error rate, DB pool, and recommendations Apdex all spiking simultaneously usually means one upstream dependency is degraded (the DB pool exhaustion is causing the other two via shared connections). One fix may resolve all three. This is the “alert correlation” pattern that tells engineering “do not chase three problems, find the one that is causing the rest”.
  3. The two P3s and one P4 are noise during this incident. They are background monitors firing for unrelated reasons; engineering should ignore them while the P1+P2 cluster is being resolved. A common mistake is to triage every alert; the right response is “P1 first, then the cluster of P2s; ignore P3+P4 until the dust settles”.
What this tells the engineering on-call:
  - "1 P1, 3 P2 clustered, 3 background" reads as: "real cascade event, single root cause likely"
  - vs. "4 P1, 4 P2 spread across services" reads as: "multiple unrelated incidents, full-team mobilisation"
  - vs. "0 P1, 0 P2, 12 P3+P4" reads as: "monitors are too noisy, scheduled tuning needed"
Three takeaways merchants should remember:
  1. Alert count alone is meaningless without priority breakdown. Twelve P5 monitors firing is a quiet day; one P1 is a crisis. Always read the priority distribution, not the total.
  2. The 5-alert threshold catches “monitor noise creep”. Healthy Datadog accounts sit at 0-2 active alerts during normal operation. If you are routinely above 5, your monitors are over-tuned (firing on transient noise). Tune monitor thresholds quarterly using Recently Flapped Monitors as the guide.
  3. Active alerts are NOT the same as active incidents. An alert is a metric breach; an incident is a human-declared coordinated response. Many alerts plus zero incidents equals “monitors are firing but engineering has not decided this is a real event yet”. The gap is normal during the first 5-15 minutes of a regression; if it persists past 30 minutes, the monitors are noisy and should be tuned.

Sibling cards merchants should reference together

CardWhy pair it with Alerts SummaryWhat the combination tells you
Active IncidentsThe human-declared peer.Many alerts plus zero incidents equals “engineering has not decided this is real yet”; many alerts plus active incidents equals coordinated response in progress.
Currently Triggered MonitorsThe detail view of the same data.Alerts Summary is the count; Currently Triggered Monitors is the table with monitor names.
Recently Flapped Monitors (24h)Identifies monitors that fire and recover repeatedly: typically threshold tuning needed.High flap count plus high active-alert count equals noisy monitors; low flap count plus stable alerts equals real degradation.
Sustained Threshold BreachesThe “stuck alerts” view: monitors in Alert state for over 30 minutes.A sustained breach is more concerning than a brief spike; pair to differentiate.
Monitors Without Notification ChannelThe silent-failure view: alerts firing but no human is paged.An active P1 alert that is also in the no-notification list equals “nobody knows it’s broken”. Highest-leverage fix on the dashboard.
Monitor Coverage by ServiceThe blind-spot view: services without alert coverage.Zero active alerts on a service is good only if the service has alert coverage; zero alerts plus zero coverage equals “we cannot see this service at all”.
Operational Health ScoreThe composite view that takes alert volume into context.Score above 80 with 7 active alerts equals noisy-monitor problem; score below 70 with 7 active alerts equals real cascade.
Top Alerting ServicesPattern view across the last week.The same service repeatedly in the top equals a chronic problem worth investing in; varied services across days equals normal noise.

Reconciling against the vendor’s own dashboard

Where to look in Datadog:
Monitors → Manage Monitors for the master list with state filters. Monitors → Triggered Monitors filtered by status:Alert OR status:Warn. Monitors → Notifications to confirm which alerts are routing where.
Why our number may legitimately differ from Datadog’s UI:
ReasonDirectionWhy
Time zoneLast-triggered timestamps shiftDatadog UI displays in account timezone; Vortex IQ stores UTC.
API rate limitsBrief gapsThe Monitors API is rate-limited; on burst minutes a polled value may use cached prior data.
Log indexing latencyLog-based monitor count lowerLogs API gating returns 400 No valid indexes when Log Management is disabled; log-based monitors persist as No Data rather than Alert.
Monitor state cacheUp to 60 secondsMonitor state refreshes once per minute; freshly triggered alerts may take up to 60 seconds to appear.
Mute / downtime exclusionVortex IQ count lowerMuted monitors and monitors in scheduled downtime are excluded from the Vortex IQ count by default; Datadog UI shows them with a mute icon.
Cross-connector reconciliation:
CardExpected relationshipWhat causes the divergence
Datadog Active IncidentsAlerts can fire without incidents being declared.A persistent gap (many active alerts, zero incidents) means engineering is treating the alerts as noise. Tune the monitors or declare an incident.
google_analytics.ga_property_healthIndependent measurement-side health peer.Active alerts on Datadog plus GA4 Property Health red equals “site is broken AND analytics is broken simultaneously”.
PagerDuty active incidentsShould be 1:1 with Datadog priority:1 alerts if the integration is configured.A gap means the PagerDuty-Datadog integration is mis-configured; pages are reaching humans but Datadog is not aware.

Known limitations / merchant FAQs

My alert count is always above 5. Are my monitors broken? Probably yes, in the sense that they need tuning. Healthy Datadog accounts sit at 0-2 active alerts during steady-state operation. If you are routinely above 5, two causes: (1) Thresholds are too tight (firing on transient noise that resolves itself); (2) You have stale monitors for services that no longer exist. Run Recently Flapped Monitors and tune the top 5 by flap count. What is the difference between an active alert and an active incident? An alert (or “monitor in Alert state”) is a metric breach: a number crossed a threshold. An incident is a coordinated response declared by a human (or PagerDuty automation). Many alerts can fire without becoming incidents (the team triages and dismisses); rarely, an incident exists without active alerts (a customer email surfaced the problem). Both cards together give the full picture. My team uses PagerDuty for paging. Why does this card matter? Because PagerDuty pages are downstream of Datadog alerts. If PagerDuty is paging your team but the Vortex IQ Alerts Summary shows zero, your PagerDuty integration is either bypassing Datadog (paging from a different source) or filtering at a different layer. The card is the source-of-truth count of monitors actively alerting in Datadog regardless of downstream routing. Why are P5 alerts even shown? They are not actionable. Two reasons: (1) Some teams use P5 for “info-level” monitors that should never page but should be visible; (2) An accumulation of P5 alerts can indicate a slow-bleed problem (gradual capacity exhaustion, growing log volume, etc) that is worth investigating proactively. The headline emphasises P1+P2 but the breakdown shows all priorities for context. My Logs API returns 400 No valid indexes. Are log-based alerts counted? No. When Log Management is disabled, log-based alerts (Datadog type:log alert) cannot evaluate and remain in No Data state. They are excluded from the active-alerts count. The Vortex IQ engine logs the gating event once at INFO level and continues serving APM, infrastructure, and synthetic alerts normally. To count log-based alerts, enable Log Management on the Datadog Pro tier. Datadog says zero active alerts but customers are complaining about errors. The classic blind spot. Three causes: (1) The relevant monitor does not exist (no alert coverage for the failing service: see Monitor Coverage by Service); (2) The monitor exists but its threshold is too generous (real degradation but not above the bar): tune the threshold; (3) The customer-facing problem is in a code path Datadog is not instrumenting (third-party widget, payment iframe, browser-only). Add Datadog RUM and synthetic browser tests to catch the third case. Why does the alert count fluctuate by 1-2 every minute? Some monitors are “flap-prone”: their underlying metric oscillates around the threshold. Each minute a few flap into Alert and then back to OK. This is normal noise. If a specific monitor is responsible for repeated flapping, it appears in Recently Flapped Monitors (24h); tune that monitor’s threshold or evaluation window to stabilise it. Can I exclude muted alerts from the count? Yes, that is the default. Monitors tagged muted:true or in scheduled downtime are excluded automatically. To include them (for an audit), use the unfiltered query in the Datadog Monitor UI. My multi-team Datadog account has 200+ monitors from teams I do not own. Are those counted? By default, all monitors in the account. To filter by team, set the connector’s tag scope to team:your_team_name. The headline number then reflects only monitors tagged with your team. Most merchants want unfiltered because alerts from any team can affect shopper experience. RUM and Synthetic alerts look different. Are they counted? Yes. The Monitors API is product-agnostic; alerts from RUM, Synthetic, APM, infrastructure, and log-based monitors all count the same. The category breakdown is available in the Datadog UI but not yet in the Vortex IQ headline; planned for a future release.

Tracked live in Vortex IQ Nerve Centre

Alerts Summary is one of hundreds of KPI pulses Vortex IQ tracks across Datadog and 70+ other ecommerce connectors. Nerve Centre runs the detection layer; Vortex Mind investigates the cause when something moves; Ask Viq lets you interrogate any number in plain English. Start for free or book a demo to see this metric running on your own data.