Alerts & Monitoring

Catch problems before your users do

FastRouter watches latency, errors, usage, and spend across every project, key, and model-then alerts the right people the moment a metric crosses the line you set.

Get started for free Book a demo

No credit card required · Free to start

Critical firing
p50 · 3,180 ms

New alert

Metric

Response Time (p50)

Performance · milliseconds

Scope

Project

Production

API Key

All keys

Model

All models

Static valuePercentage change

WarningAbove 1,500 ms

CriticalAbove 3,000 ms

Evaluate every15 minutes

Why FastRouter Alerts

Monitoring that understands AI workloads

Track the metrics that actually move reliability and cost, set thresholds that fit each workload, and route every alert to the people who can act on it.

Metrics across the whole stack

Watch response time, time to first token, throughput, error rate, token usage, and spend-without bolting on a separate monitoring tool.

Warning and Critical severities

Pair an early-warning threshold with a critical one so teams see trouble building long before it turns into an incident.

Static or percentage-change thresholds

Alert on fixed limits you already know, or on sudden swings against a historical baseline to catch anomalies you don't.

How alerts work

From signal to notification, automatically

Every alert runs the same simple loop on a schedule you choose-so the right people hear about the right problems without anyone watching dashboards.

Step 1 · Evaluate

Metric sampled on a schedule

Response TimeEvery 15 min

FastRouter checks the selected metric at your chosen interval-from every 5 minutes to daily.
Each alert is scoped to specific projects, API keys, and models.

Step 2 · Compare

Threshold breached

WarningCritical

Values are compared against a static limit or a percentage-change baseline.
Warning flags trouble early; Critical signals it needs attention now.

Step 3 · Notify

The right people are notified

EmailOwnersMembersWebhooks

Notifications go to email, organization owners, project members, or your own webhooks.
The alert flips OK ↔ Firing automatically and resolves on recovery.

Metrics

Monitor the signals that actually matter

Pick from performance, reliability, and cost metrics—each scoped to the projects, API keys, and models you care about; so every alert watches exactly the slice of traffic you choose.

Performance

Response Time (p50), Time to First Token (p50), and Throughput in requests per second.

Reliability

Error Count and Error Rate surface failing requests and degraded providers fast.

Usage & cost

Token Consumption, Daily and Monthly Spend, and Total Requests-cumulative metrics reset on a fixed UTC schedule.

Metrics catalog

9 metrics

Performance

Response Timep50 ms
Time to First Tokenp50 ms
Throughputreq/s

Reliability

Error Countcount
Error Rate%

Usage & Cost

Token Consumptiontokens
Daily Spend$ · 00:00 UTC
Monthly Spend$ · 1st UTC
Total Requestscount

Alert types

Static limits or percentage-change anomalies

Use a static value when you know the line you can't cross, or percentage change to flag sudden movement against a historical baseline. Each alert pairs a Warning threshold with a required Critical one.

Static value

Fire when a metric goes Above or Below a fixed number-ideal for known SLAs and budgets.

Percentage change

Compare to the previous period or 1 hour, day, week, or month ago using ((current − previous) / previous) × 100.

Two thresholds

Set Warning and Critical levels; a good rule of thumb is Warning at 50-70% of Critical.

Response Time · p50

Production

Last 60 min

Static valuePercentage change

WarningAbove 1,500 ms

CriticalAbove 3,000 ms

Notifications

Reach the right people, then quiet down

When a threshold is breached the alert fires and notifies your team; when the metric recovers it resolves on its own-no manual cleanup, no stale pages.

Flexible delivery

Notify by email, organization owners, or project members-or POST to your own webhooks to route alerts into Slack, PagerDuty, or internal tools.

OK ↔ Firing states

Alerts move between OK and Firing automatically as conditions change across evaluations.

Pause without losing config

Temporarily pause an alert to keep its full setup while it stops evaluating-then resume in one click.

Criticaljust now

Response Time (p50) is 3,180 ms

Above 3,000 ms · Production · GPT-5.5

Organization owners

Project members

Webhooks

Alert stateOKFiring

Static vs percentage change

Pick the right alert type for each metric

Static value alerts catch the limits you already know; percentage-change alerts catch the anomalies you don't. Most teams use both, side by side.

Static value alerts compared with percentage change alerts
How they compare	Static ValueFixed threshold	Percentage Changevs baseline
Fixed numeric threshold	Yes	No
Compares to a historical baseline	No	Yes
Condition direction	Above / Below	Above / Below
Requires a comparison period	No	Yes
Works without historical data	Yes	No
Best for known limits & budgets	Yes	No
Best for anomaly & spike detection	No	Yes
Catches sudden drops or surges	No	Yes
Typical example	Daily Spend Above $500	Error Rate +50% vs 1 day ago

Percentage change uses ((current − previous) / previous) × 100 against your chosen comparison period-Previous period, 1 hour, 1 day, 1 week, or 1 month ago.

Built for AI operations

Alerts for the incidents that actually happen

From latency regressions to runaway spend, these are the alerts teams running models in production set up on day one.

High response time alert

Watch Production p50 response time with a static value alert-Warning above 1,500 ms and Critical above 3,000 ms-evaluated every 15 minutes.

Error rate spike detection

Use percentage change versus 1 day ago to fire when error rate jumps 50% (Warning) or 100% (Critical), checked every 5 minutes.

Daily spend limit

Catch runaway usage early with a static Daily Spend alert-Warning at $400 and Critical at $500-evaluated hourly.

Traffic drop detection

Spot outages fast when Total Requests fall Below 30% (Warning) or 50% (Critical) versus 1 hour ago, every 5 minutes.

FAQ

Answers for teams running models in production

Nothing changes. If a metric has no data points during an evaluation interval, the alert keeps its current state-it won't switch to OK or Firing until there's data to evaluate again. This avoids false alarms during quiet periods.

Yes. You can run multiple alerts on the same metric with different scopes, thresholds, or intervals-for example a strict alert on Production and a looser one on staging-so each environment is monitored on its own terms.

Daily Spend resets every day at midnight UTC and Monthly Spend resets on the 1st of the month at 00:00 UTC. Cumulative counters follow the same UTC schedule, and percentage-change comparisons are calculated against those same windows.

Everything runs in UTC. Evaluation intervals, daily and monthly resets, and comparison periods are all calculated in UTC, so alerts behave consistently no matter where your team is based.

When an alert fires or resolves, FastRouter notifies by email and can route to organization owners or project members. You can also send notifications to your own webhooks, so alerts flow straight into Slack, PagerDuty, incident tooling, or any internal service that accepts an HTTP request.

Yes. Paused alerts keep their full configuration but stop evaluating and stop sending notifications. Resume whenever you're ready and the alert picks back up-handy during planned maintenance, migrations, or noisy deploys.

Each alert can be scoped to Projects, API Keys, and Models-set each to All or to a specific selection-so you can monitor a single key, one model, or an entire project with the same alerting engine.

Know before your users do

Set up your first alert in minutes-pick a metric, choose a threshold, and let FastRouter watch your AI traffic around the clock.

Get started for free Talk to us

Catch problems before your users do

Monitoring that understands AI workloads

Metrics across the whole stack

Warning and Critical severities

Static or percentage-change thresholds

From signal to notification, automatically

Metric sampled on a schedule

Threshold breached

The right people are notified

Monitor the signals that actually matter

Performance

Reliability

Usage & cost

Static limits or percentage-change anomalies

Static value

Percentage change

Two thresholds

Reach the right people, then quiet down

Flexible delivery

OK ↔ Firing states

Pause without losing config

Pick the right alert type for each metric

Alerts for the incidents that actually happen

High response time alert

Error rate spike detection

Daily spend limit

Traffic drop detection

Answers for teams running models in production

What happens if there's no data in the evaluation window?

Can I create more than one alert for the same metric?

When do cumulative metrics reset?

What timezone are alerts evaluated in?

How are alert notifications delivered?

Can I pause an alert without deleting it?

What can I scope an alert to?

Know before your users do