Skip to main content
Nerve Centre KPIs · Audit Profile · Sentiment Settings Sentry state means nothing to a merchant unless it’s joined to revenue. This audit answers four questions: (1) is the application healthy right now (error rate, Apdex, p95/p99 latency, throughput), (2) are the alert rules that should be watching it actually wired up (projects with no metric alerts, alerts with no owner, stale ignored issues), (3) is the on-call response fast enough (MTTA / MTTR vs the SLA window, incident backlog), and (4) when something IS broken, how much money is on fire per minute while a checkout-adjacent project’s error rate is spiking?

What this audit checks

Authentication & access

  • Auth token valid (GET on /organizations//)
  • Token carries org:read, project:read, event:read scopes
  • Organization slug resolves and is not pending deletion
  • Host correct (sentry.io SaaS vs self-hosted)

Alert coverage (the blind-spot test)

  • Projects with no metric-alert rule (error rate / latency unmonitored)
  • Alert rules with no action / notification target (fires silently)
  • Active projects with zero events in 24h (lost instrumentation)
  • Unresolved issues ignored / muted but still recurring after 7d
  • Disabled projects still receiving events (misrouted DSN)

Reliability & performance

  • Error rate above 2% sustained (release or dependency regression)
  • Apdex below 0.85 (users feeling the slowness)
  • p95 latency above 1500ms / p99 above 3000ms
  • Throughput dropped > 30% WoW (capacity / outage signal)
  • Crash-free session rate below SLA target (release health)

Incident response & SLA

  • Mean time to acknowledge above 30 min (page fatigue)
  • Mean time to resolve above the SLA window (60 min default)
  • Open incidents older than the SLA breach window
  • Top alerting services concentration (one project = most noise)
  • Incident re-open rate (resolved too early, reopened within 24h)

Error quality & noise

  • Top error types by occurrence count (fix-this-first ranking)
  • New error types in last 24h not seen in prior 7d (regression)
  • Fatal-level issue volume above baseline
  • Single error class exceeding 1000 events (runaway)

Cross-channel: revenue-at-risk (the killer area)

  • Open incident on a project that maps to a commerce-checkout service = compute $/min lost (commerce.revenue_per_min × open_incident_minutes × estimated_traffic_loss_pct)
  • Error-rate spike on checkout-adjacent project during peak hours
  • 5xx / error spike during a campaign push (sibling = google_ads / amazon_ads / klaviyo) - paying for traffic that can’t convert
  • Conversion drop during incident windows (vs 90D baseline)
  • Cart abandonment spike correlated with elevated error rate

Severity thresholds

SignalWarnCritical
error_rate_pct12
apdex0.90.85
p95_latency_ms10001500
p99_latency_ms20003000
throughput_change_pct_wow-15-30
crash_free_rate_pct99.999.5
mtta_minutes1530
mttr_minutes3060
incidents_open_count13
services_degraded_count12
services_down_count01
projects_no_alert_rule_count13
projects_no_events_24h_count13
top_error_class_max_count1001000
new_error_types_24h_count15

Data sources

  • GET {host}/api/0/organizations/{organization}/ - Auth + org sanity
  • GET {host}/api/0/organizations/{organization}/projects/ - Project inventory + status + health
  • GET {host}/api/0/organizations/{organization}/events/ - Apdex / latency / error-rate aggregates
  • GET {host}/api/0/organizations/{organization}/events-stats/ - Time-series throughput + error-rate trend
  • GET {host}/api/0/organizations/{organization}/issues/ - Top error types + new/fatal issue detection
  • GET {host}/api/0/organizations/{organization}/incidents/ - Incident inventory, MTTA/MTTR, revenue-at-risk join
  • GET {host}/api/0/organizations/{organization}/alert-rules/ - Metric-alert coverage + firing/acknowledged state
  • GET {host}/api/0/organizations/{organization}/sessions/ - Release health / crash-free rate for service health + SLA