Skip to main content

How billing works

You pay per token. Three rates apply to each request: input for the tokens you send, cached for input tokens served from a prefix cache, and output for the tokens the model generates. Every figure in the table below is in USD per 1M tokens. Caching is automatic — Valar matches shared prompt prefixes for you and charges the lower cached rate on the tokens that hit. You can raise your hit rate by passing prompt_cache_key as a routing hint, but it’s optional. The rate you pay also depends on the completion window you request. Faster scheduling carries a higher rate, so the on-demand Now window costs more than Standard. The table prices out the windows available for each model; coverage varies, and you can mix windows per request. Completion Windows explains the trade-offs.
Window coverage differs by model, and we keep adding models and widening window support. If a model or window you want isn’t shown, get in touch.

Rate table

USDper 1M tokens
ModelWindowInputCachedOutput
DeepSeek V4 Pro
deepseek-ai/DeepSeek-V4-Pro
Standard0.900.152.30
Now1.600.154.10
DeepSeek V4 Flash
deepseek-ai/DeepSeek-V4-Flash
Standard0.1350.0150.21
Now0.2430.0270.378
Kimi-K2.6
moonshotai/Kimi-K2.6
Standard0.450.200.30
Now0.900.203.60
GLM-5.1
zai-org/GLM-5.1-FP8
Standard0.150.030.60
Now1.300.264.40
gpt-oss-120b
openai/gpt-oss-120b
Standard0.040.020.30
Now0.060.030.40
Qwen3.5-397B-A17B
Qwen/Qwen3.5-397B-A17B
Standard0.250.050.75
Now0.450.091.35
Gemma 4 31B IT
google/gemma-4-31B-it
Standard0.180.100.30
Now0.360.200.60
MiniMax M2.7
MiniMaxAI/MiniMax-M2.7
Standard0.150.030.60
Now0.270.0541.08
For each model’s capabilities — image input and reasoning support — see Models.