Serverless Pricing ↗

fireworks guide intermediate embeddings vision cost-management models

Summary: Per-token serverless pricing for text, vision, and embedding models, including Priority and Fast serving paths

Original Documentation

Documentation Index#
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.

Per-token serverless pricing for text, vision, and embedding models, including Priority and Fast serving paths

Overview#

Serverless inference is priced per token. For how Standard, Priority, and Fast serving paths work and how to select one, see Serverless Serving Paths.

Every text or vision request is billed across three dimensions:

Input tokens — what you send to the model.
Cached input tokens — input tokens served from prompt cache, priced lower.
Output tokens — what the model generates.

Embeddings are billed only on input tokens.

How pricing works#

Prices below are per 1 million tokens in US dollars.
Batch inference is billed at 50% of serverless pricing on both input and output. See Batch inference.

Text and vision models#

Per-model pricing for headline models. Fast variants appear as adjacent rows. In each Standard or Priority cell, prices are input / cached input / output (USD per 1M tokens), in that order.

Model	Standard	Priority
Kimi K2.6	$0.95 / $0.16 / $4.00	$1.50 / $0.22 / $6.00
Kimi K2.6 Fast	$2.00 / $0.30 / $8.00	—
Kimi K2.5	$0.60 / $0.10 / $3.00	—
DeepSeek V4 Pro	$1.74 / $0.145 / $3.48	$2.61 / $0.218 / $5.22
DeepSeek V4 Flash	$0.14 / $0.028 / $0.28	—
GLM 5.1	$1.40 / $0.26 / $4.40	$2.10 / $0.39 / $6.60
GLM 5.1 Fast	$2.80 / $0.52 / $8.80	—
Qwen 3.6 Plus	$0.50 / $0.10 / $3.00	—
MiniMax 2.7	$0.30 / $0.06 / $1.20	$0.45 / $0.09 / $1.80
MiniMax 2.5	$0.30 / $0.03 / $1.20	—
OpenAI GPT OSS 120B	$0.15 / $0.015 / $0.60	$0.18 / $0.018 / $0.72
OpenAI GPT OSS 20B	$0.07 / $0.035 / $0.30	—

— in the Priority column means Priority is not available for that model. This pricing table is the source of truth for Priority availability.

Other base models — by size and architecture#

For any text or vision model not listed individually, pricing is set by parameter count and architecture. These size-based prices apply uniformly to input and output (no separate cached-input rate):

Model	$ / 1M tokens
Less than 4B parameters	$0.10
4B – 16B parameters	$0.20
More than 16B parameters	$0.90
MoE up to 56B parameters (e.g. Mixtral 8x7B)	$0.50
MoE 56.1B – 176B parameters (e.g. DBRX, Mixtral 8x22B)	$1.20

Embeddings#

Embeddings are billed per 1M input tokens.

Base model parameter count	$ / 1M input tokens
up to 150M	$0.008
150M – 350M	$0.016
Qwen3 8B	$0.10

Notes#

For account-level controls (spend tiers, monthly budget, on-demand GPU quotas), see Account quotas.

Link last verified June 7, 2026. View original ↗

Source: Fireworks AI Docs

Link last verified: 2026-06-07