Flex Pricing

Lower-cost inference for workloads that can wait

Append :flex to any supported model ID and FastRouter routes the request to the provider's discounted Flex tier-significantly lower token costs in exchange for variable throughput. Your key, endpoint, and payload stay exactly the same.

No credit card required · Free to start

Inference pricing
openai/gpt-5.4-nanoper 1M tokens
StandardDefault
Low latency
Input$0.20
Output$1.25
Flex:flex
Variable
Input$0.10
Output$0.63
Why Flex Pricing

Meaningful savings, one suffix away

Flex trades a little latency for a lot of savings on the requests that don't need to be instant-without changing how you call the API.

Activate with one suffix

Append :flex to any supported model ID and FastRouter routes the request to the provider's Flex tier. Your API key, endpoint, and request body stay exactly the same.

Pay less per token

Flex sends eligible requests to the provider's discounted tier, cutting token costs substantially-around 50% on some models, with savings that vary per model and provider.

Trade latency you don't need

Flex relaxes throughput guarantees in exchange for lower cost, so async and background jobs run cheaper while latency-sensitive traffic stays on the standard tier.

How it works

From :flex suffix to a cheaper response

Add the suffix, FastRouter detects it and selects the provider's Flex tier, and you get back the same response shape-just at a lower token cost.

Request

model …:flex

OpenAI-compatible
  • Append :flex to a supported model ID.
  • Key, endpoint, and payload stay unchanged.

FastRouter

Detects :flex

Tier routingprovider.only
  • Selects the provider's Flex tier for the call.
  • Honors a provider pin so it isn't rerouted.

Provider

Flex tier

OpenAIGoogle
  • Runs on the provider's discounted capacity.
  • Throughput varies with provider load.

Response

Same shape, less cost

Lower token cost
  • Identical response format to standard calls.
  • Billed at the Flex token rate.

The only change

BeforeStandard

openai/gpt-5.4-nano

+ :flex suffix
AfterFlex

openai/gpt-5.4-nano:flex

A routing flag, not a migration

Flex lives entirely in the model field. There is no separate endpoint, SDK, or account to set up-flip a request to Flex by editing one string, and flip it back just as easily.

  • Add :flex to a supported model ID-the API key, endpoint, and payload are unchanged.
  • Pin the provider with provider.only so the call lands on the intended Flex tier.
Drop-in activation

Turn on savings without touching your stack

Flex is a routing suffix, not a new API. Keep your OpenAI-compatible client, the same endpoint, and the same payload-just add :flex to the model ID.

One-line model change

openai/gpt-5.4-nano becomes openai/gpt-5.4-nano:flex. That single edit is the only change required.

Same OpenAI-compatible API

Your base URL, API key, headers, and message format stay identical across every SDK and language.

Optional provider pin

Add provider.only to keep a request on a specific provider's Flex tier so it is never rerouted away.

POST /api/v1/chat/completions
JSON
{
"model": "openai/gpt-5.4-nano:flex",
"provider": { "only": ["openai"] },
"messages": [{ "role": "user", "content": "Summarise…" }]
}
Routed to OpenAI Flex tier~50% lower cost
Lower token cost

Meaningful savings on eligible requests

When you route through the Flex tier, you pay the provider's discounted token rate. The savings can be substantial-with the exact reduction depending on the model and provider.

Discounted input and output

Both prompt and completion tokens are billed at the lower Flex rate-no separate plan, no commitment.

Up to ~50% off on some models

For GPT-5.4 Nano on OpenAI, input drops from $0.20 to $0.10 and output from $1.25 to $0.63 per 1M tokens.

Usage-based, per model

Check the model catalog for per-model Flex pricing across every supported provider before you switch.

Cost per 1M tokens

gpt-5.4-nano

Input tokens

Standard
$0.20
Flex
$0.10

Output tokens

Standard
$1.25
Flex
$0.63

~50% off per token

Example: GPT-5.4 Nano on OpenAI. Savings vary by model.

Right tier, right request

Send the work that can wait to Flex

Flex shines on throughput-tolerant workloads. Keep interactive, streaming, and SLA-bound traffic on the standard tier, and move everything async to Flex.

Best for background work

Extraction, classification, batch summarisation, evals, and scheduled jobs all fit the Flex tier well.

Higher tail latency under load

Flex requests can slow down during peak provider demand, so avoid them for real-time, user-facing responses.

Mix tiers per request

Decide tier on a per-call basis-route the same model through Flex or standard depending on the workload.

Route by workload

Send to Flex
  • Data extraction & classification
  • Batch document summarisation
  • Eval & dataset generation
  • Scheduled background jobs
Keep on Standard
  • Real-time chat & interactive UIs
  • Streaming responses to users
  • Latency-sensitive agent loops
  • Voice & realtime applications
Standard vs Flex

Pick the tier that fits each request

Both tiers share the same API, key, and payload. The difference is the trade-off you make-Flex swaps guaranteed throughput for a lower token cost.

Comparison of FastRouter Standard and Flex inference tiers
CharacteristicStandardDefault tierFlexAppend :flex
Cost & savings
Discounted token pricingNot includedIncluded
Up to ~50% lower cost on eligible modelsNot includedIncluded
Latency & throughput
Guaranteed low latencyIncludedNot included
Consistent throughput at peak loadIncludedNot includedVariable
Recommended for streaming to usersIncludedNot included
Setup
ActivationDefault:flex suffix
Same key, endpoint & payloadIncludedIncluded
Provider pinning with provider.onlyIncludedIncluded
Best fit
Batch, eval & background jobsNot includedIncluded
Real-time, user-facing trafficIncludedNot included

A check marks the tier that best fits each characteristic. Flex is available on select OpenAI and Google models-look for the Flex tab in the model catalog.

Use cases

Built for async and at-scale workloads

Anywhere throughput matters more than instant latency, Flex turns the same request into a cheaper one.

Data extraction & classification

Run extraction and classification pipelines on Flex, where steady throughput matters more than instant responses.

Batch document summarisation

Summarise large document sets in the background and pay significantly less per token across the whole job.

Eval & dataset generation

Generate evaluation runs and fine-tuning datasets at scale without paying the standard-tier rate on every call.

Scheduled background jobs

Move cron-driven and async preprocessing to Flex to cut cost on non-urgent, latency-tolerant workloads.

FAQ

Flex Pricing questions, answered

Flex is a tiered pricing mode offered by OpenAI and Google on select models. When you append :flex to a model ID, FastRouter routes the request to the provider's Flex tier, which significantly reduces token costs in exchange for variable throughput and potentially higher latency under peak load.

Append :flex to the model ID-for example, openai/gpt-5.4-nano:flex or google/gemini-3.1-pro-preview:flex. Your API key, endpoint, headers, and request body stay exactly the same, so the suffix is the only change required.

Flex is currently available on select OpenAI and Google models (Gemini API on Vertex AI and AI Studio). Check the model catalog for a Flex tab under Provider Details-if it is present, Flex pricing is available for that model.

Savings vary by model. As a documented example, GPT-5.4 Nano on OpenAI drops from $0.20 to $0.10 per 1M input tokens and from $1.25 to $0.63 per 1M output tokens-roughly 50% off tokens. Check the model catalog for per-model Flex pricing across providers.

Avoid Flex for real-time, user-facing, or streaming responses. Flex requests can experience higher tail latencies during peak provider load, so interactive, latency-sensitive, and SLA-bound workloads should stay on the standard tier.

Yes. Add a provider pin such as provider.only set to openai (or googleaistudio / googlevertexai) so the request goes to the intended provider's Flex tier and isn't rerouted elsewhere.

Cut the cost of every request that can wait

Append :flex to a supported model and route batch, eval, and background workloads to the provider's discounted tier-no new SDK, no code rewrite.