Fallback Models

Keep AI apps online when models fail

List a primary model and ordered fallbacks in a single request. If a model is rate-limited, down, or returns an error, FastRouter automatically retries the next candidate-so a single failure never drops the request.

No credit card required · Free to start

Fallback list
One requestmodel + models[]
1

openai/gpt-4o

Rate limited

on failure, try next
2

openai/o1

Provider unavailable

on failure, try next
3

google/gemini-1.5-pro

Returned a completion

Billed for google/gemini-1.5-pro
Why Fallback Models

Reliability you configure in one array

Turn a single point of failure into an ordered list of backups. No client retry loops, no custom failover code-just a models array on the request you already send.

One request, an ordered list

Set your primary in the model field and one or more backups in the models array. FastRouter tries them top to bottom until one succeeds.

Automatic retry on failure

When a model is rate-limited, unavailable, or returns an error, FastRouter moves to the next candidate for you-no client-side retry logic to maintain.

Resilience without surprises

You are billed only for the model that actually answers, and the response reports which one ran-so failover never hides your real usage.

How it works

From request to response, even when a model fails

FastRouter sends your request to the primary model first. On failure it walks the fallback list in order until a model returns a successful response-or every candidate has been tried.

POST /api/v1/chat/completions
One request
"model": "openai/gpt-4o","models": ["openai/o1", "google/gemini-1.5-pro"]

Attempt 1

openai/gpt-4o

Rate limited
on failure

Attempt 2

openai/o1

Unavailable
on failure

Attempt 3

google/gemini-1.5-pro

Success

200 OK · request succeeded

"model": "google/gemini-1.5-pro"

Billed for the model that ran

One change, every request gets a safety net

Add a models array next to your existing model field and the same call now survives rate limits, downtime, and moderation. The final model is returned in the response, and you are billed only for the model that actually answered.

Fallback list

Define a primary model and ordered fallbacks

Send your usual chat completions request, then add a models array of backups. The primary in model is tried first, and the array is your ordered safety net.

Primary plus ordered fallbacks

The model field sets the primary; the models array lists one or more fallbacks, tried in the order you write them.

Tried in strict order

FastRouter iterates the list from top to bottom until a model returns a successful response or every candidate fails.

As many candidates as you need

List backups across different models and providers so a single point of failure becomes a chain of reliable alternatives.

Fallback list

JSON
{
"model": "openai/gpt-4o",
"models": [
"openai/o1",
"google/gemini-1.5-pro"
],
"stream": true
}

Routing order

Tried top to bottom
  • 1openai/gpt-4oPrimary
  • 2openai/o1Fallback
  • 3google/gemini-1.5-proFallback
Automatic failover

Retries on the failures that actually matter

Fallback kicks in whenever the current model is unavailable or returns an error-so transient problems become a retry instead of a dropped request.

Rate limits & capacity

Hit a rate limit at peak traffic and the request rolls over to the next model instead of failing.

Downtime & unavailability

If a model or provider is down or unreachable, FastRouter keeps the request moving through your list.

Moderation & other errors

Moderation blocks and other errors trigger the next candidate rather than returning an error to your app.

Failover triggers

Automatic

openai/gpt-4o

Primary attempt failed

Rate limited

Falls back when a model

  • Hits a rate limit
  • Is down or unavailable
  • Is blocked by moderation
  • Returns another error
FastRouter retries the next model in models until one responds.
Results & billing

Always know which model answered

Failover never leaves you guessing. The response tells you exactly which model ran, and billing follows whichever model actually processed the request.

Final model in the response

The model field of the response body reports the model that ultimately produced the completion.

Billed for what ran

Billing is based on the model that actually processes the request-failed candidates are never charged.

Errors only when all fail

If the primary and every fallback fail, FastRouter returns the final error-so you know the whole list was exhausted.

Response body
200 OK
{
"model": "google/gemini-1.5-pro",// final model used
...
}
Billed for google/gemini-1.5-pro

openai/gpt-4o · openai/o1 not charged

With vs without fallback

One array between you and a dropped request

A single model has nowhere to go when it fails. A fallback list keeps the very same request alive across rate limits, downtime, and moderation.

Comparison of a single-model request versus a fallback model list
BehaviorSingle modelmodel onlyFallback listmodel + models
When a model fails
Retries the next candidateNot includedIncluded
Survives rate limitsNot includedIncluded
Survives downtime & unavailabilityNot includedIncluded
Recovers from moderation blocksNot includedIncluded
Results & cost
Reports the model that answeredIncludedIncluded
Billed only for the model that ranIncludedIncluded
Tries every candidate before erroringNot includedIncluded

Add resilience by passing a models array alongside model-no other change to your request.

Use cases

Built to keep requests succeeding

Wherever an outage or a limit would normally drop a request, an ordered fallback list keeps it moving.

Keep user-facing apps online

Roll over to a backup when your primary model is rate-limited or down, so a single failure never reaches your users.

Span multiple providers

List candidates across different providers so one provider's incident cannot take your whole app offline.

Absorb traffic spikes

When burst traffic trips a model's rate limit, requests continue on the next candidate automatically.

Order by preference

Put your first-choice model up front and alternates behind it-you are billed only for the one that answers.

FAQ

Fallback model questions, answered

Pass your primary model in the model field and one or more fallbacks in the models array on the same chat completions request. FastRouter tries the primary first, then each model in the array in order until one returns a successful response.

FastRouter falls back when the current model is unavailable or returns an error-for example due to rate limits, downtime, or moderation. Instead of failing the request, it moves on to the next candidate in your list.

Strictly in the order you list them. FastRouter attempts the model value first, then walks the models array from top to bottom until a model succeeds or every candidate has failed.

Billing is based on the model that actually processes the request. If your primary fails and a fallback answers, you are charged for the fallback that ran-not the candidates that failed.

The final model used is returned in the model field of the response body, so you can always see which candidate produced the response.

If the primary and all fallbacks fail, FastRouter returns the final error to you. Listing additional candidates-ideally across different providers-reduces the chance of reaching that point.

Add fallbacks before your next outage

Pass a models array on your existing chat completions request and let FastRouter retry the next candidate whenever one fails.