Serverless Serving Paths ↗
noOriginal Documentation
Documentation Index#
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.
Standard, Priority, and Fast serving paths on Fireworks Serverless
Fireworks Serverless offers three serving paths:
- Standard is the default serving path. No
service_tierparameter is needed. - Priority tier is for workloads that require higher reliability during peak traffic.
- Fast is for workloads that require higher speeds.
Priority tier#
Priority tier is for workloads that require higher reliability during peak traffic periods, at a higher price point. Priority tier is prioritized above Standard traffic and is less likely to be load shed (503 server overloaded).
To use priority tier, set service_tier to "priority". Supported on OpenAI-compatible chat completions and on the Anthropic-compatible messages API:
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/models/kimi-k2p5",
"service_tier": "priority",
"messages": [{"role": "user", "content": "Hello"}]
}'Priority tier is available on select models. Models and pricing are listed on the Serverless pricing page.
Fast#
Fast is a high-speed serving path, useful for interactive applications that require fast response speeds, at a higher price point. Fast variants aim for 100+ tokens per second of generated throughput. It is not a different model and the quality of the model remains the same.
Fast is available for select models. To use Fast, change the model ID as listed below.
| Model | model ID |
|---|---|
| Kimi K2.6 Fast | accounts/fireworks/routers/kimi-k2p6-fast |
| GLM 5.1 Fast | accounts/fireworks/routers/glm-5p1-fast |
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $FIREWORKS_API_KEY" \
-d '{
"model": "accounts/fireworks/routers/kimi-k2p6-fast",
"messages": [{"role": "user", "content": "Hello"}]
}'Pricing is listed on the Serverless pricing page.
Related#
- Serverless overview
- Serverless quickstart
- Text models
- Anthropic compatibility —
service_tieris supported on both OpenAI-compatible chat completions and the AnthropicmessagesAPI.