Fallback Models

Keep AI apps online when models fail

List a primary model and ordered fallbacks in a single request. If a model is rate-limited, down, or returns an error, FastRouter automatically retries the next candidate-so a single failure never drops the request.

Get started for free Book a demo

No credit card required · Free to start

Recovered
automatically

Fallback list

One requestmodel + models[]

openai/gpt-4o

Rate limited

on failure, try next

openai/o1

Provider unavailable

on failure, try next

google/gemini-1.5-pro

Returned a completion

Billed for google/gemini-1.5-pro

Why Fallback Models

Reliability you configure in one array

Turn a single point of failure into an ordered list of backups. No client retry loops, no custom failover code-just a models array on the request you already send.

One request, an ordered list

Set your primary in the model field and one or more backups in the models array. FastRouter tries them top to bottom until one succeeds.

Automatic retry on failure

When a model is rate-limited, unavailable, or returns an error, FastRouter moves to the next candidate for you-no client-side retry logic to maintain.

Resilience without surprises

You are billed only for the model that actually answers, and the response reports which one ran-so failover never hides your real usage.

How it works

From request to response, even when a model fails

FastRouter sends your request to the primary model first. On failure it walks the fallback list in order until a model returns a successful response-or every candidate has been tried.

POST /api/v1/chat/completions

One request

"model": "openai/gpt-4o","models": ["openai/o1", "google/gemini-1.5-pro"]

Attempt 1

openai/gpt-4o

Rate limited

on failure

Attempt 2

openai/o1

Unavailable

on failure

Attempt 3

google/gemini-1.5-pro

Success

200 OK · request succeeded

"model": "google/gemini-1.5-pro"

Billed for the model that ran

One change, every request gets a safety net

Add a models array next to your existing model field and the same call now survives rate limits, downtime, and moderation. The final model is returned in the response, and you are billed only for the model that actually answered.

Fallback list

Define a primary model and ordered fallbacks

Send your usual chat completions request, then add a models array of backups. The primary in model is tried first, and the array is your ordered safety net.

Primary plus ordered fallbacks

The model field sets the primary; the models array lists one or more fallbacks, tried in the order you write them.

Tried in strict order

FastRouter iterates the list from top to bottom until a model returns a successful response or every candidate fails.

As many candidates as you need

List backups across different models and providers so a single point of failure becomes a chain of reliable alternatives.

Fallback list

JSON

{

"model": "openai/gpt-4o",

"models": [

"openai/o1",

"google/gemini-1.5-pro"

"stream": true

}

Routing order

Tried top to bottom

1openai/gpt-4oPrimary
2openai/o1Fallback
3google/gemini-1.5-proFallback

Automatic failover

Retries on the failures that actually matter

Fallback kicks in whenever the current model is unavailable or returns an error-so transient problems become a retry instead of a dropped request.

Rate limits & capacity

Hit a rate limit at peak traffic and the request rolls over to the next model instead of failing.

Downtime & unavailability

If a model or provider is down or unreachable, FastRouter keeps the request moving through your list.

Moderation & other errors

Moderation blocks and other errors trigger the next candidate rather than returning an error to your app.

Failover triggers

Automatic

openai/gpt-4o

Primary attempt failed

Rate limited

Falls back when a model

Hits a rate limit
Is down or unavailable
Is blocked by moderation
Returns another error

FastRouter retries the next model in models until one responds.

Results & billing

Always know which model answered

Failover never leaves you guessing. The response tells you exactly which model ran, and billing follows whichever model actually processed the request.

Final model in the response

The model field of the response body reports the model that ultimately produced the completion.

Billed for what ran

Billing is based on the model that actually processes the request-failed candidates are never charged.

Errors only when all fail

If the primary and every fallback fail, FastRouter returns the final error-so you know the whole list was exhausted.

Response body

200 OK

{

"model": "google/gemini-1.5-pro",// final model used

...

}

Billed for google/gemini-1.5-pro

openai/gpt-4o · openai/o1 not charged

With vs without fallback

One array between you and a dropped request

A single model has nowhere to go when it fails. A fallback list keeps the very same request alive across rate limits, downtime, and moderation.

Comparison of a single-model request versus a fallback model list
Behavior	Single modelmodel only	Fallback listmodel + models
When a model fails
Retries the next candidate	Not included	Included
Survives rate limits	Not included	Included
Survives downtime & unavailability	Not included	Included
Recovers from moderation blocks	Not included	Included
Results & cost
Reports the model that answered	Included	Included
Billed only for the model that ran	Included	Included
Tries every candidate before erroring	Not included	Included

Add resilience by passing a models array alongside model-no other change to your request.

Use cases

Built to keep requests succeeding

Wherever an outage or a limit would normally drop a request, an ordered fallback list keeps it moving.

Keep user-facing apps online

Roll over to a backup when your primary model is rate-limited or down, so a single failure never reaches your users.

Span multiple providers

List candidates across different providers so one provider's incident cannot take your whole app offline.

Absorb traffic spikes

When burst traffic trips a model's rate limit, requests continue on the next candidate automatically.

Order by preference

Put your first-choice model up front and alternates behind it-you are billed only for the one that answers.

FAQ

Fallback model questions, answered

Pass your primary model in the model field and one or more fallbacks in the models array on the same chat completions request. FastRouter tries the primary first, then each model in the array in order until one returns a successful response.

FastRouter falls back when the current model is unavailable or returns an error-for example due to rate limits, downtime, or moderation. Instead of failing the request, it moves on to the next candidate in your list.

Strictly in the order you list them. FastRouter attempts the model value first, then walks the models array from top to bottom until a model succeeds or every candidate has failed.

Billing is based on the model that actually processes the request. If your primary fails and a fallback answers, you are charged for the fallback that ran-not the candidates that failed.

The final model used is returned in the model field of the response body, so you can always see which candidate produced the response.

If the primary and all fallbacks fail, FastRouter returns the final error to you. Listing additional candidates-ideally across different providers-reduces the chance of reaching that point.

Add fallbacks before your next outage

Pass a models array on your existing chat completions request and let FastRouter retry the next candidate whenever one fails.

Get started for free Talk to us

Keep AI apps online when models fail

Reliability you configure in one array

One request, an ordered list

Automatic retry on failure

Resilience without surprises

From request to response, even when a model fails

One change, every request gets a safety net

Define a primary model and ordered fallbacks

Primary plus ordered fallbacks

Tried in strict order

As many candidates as you need

Retries on the failures that actually matter

Rate limits & capacity

Downtime & unavailability

Moderation & other errors

Always know which model answered

Final model in the response

Billed for what ran

Errors only when all fail

One array between you and a dropped request

Built to keep requests succeeding

Keep user-facing apps online

Span multiple providers

Absorb traffic spikes

Order by preference

Fallback model questions, answered

How do I configure fallback models?

What triggers a fallback to the next model?

In what order are fallback models tried?

Which model am I billed for when a fallback runs?

How do I know which model answered?

What happens if every model in the list fails?

Add fallbacks before your next outage