Reliability and Error Handling ↗
noOriginal Documentation
Documentation Index#
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.
Recommended patterns for timeouts, retries, and error handling when building production applications on the Fireworks API.
Building reliable applications requires handling network conditions, transient errors, and long-running requests. This guide covers recommended patterns for production use.
Timeout configuration#
Set timeouts based on your workload type:
| Workload | Recommended client timeout |
|---|---|
| Interactive / chat | 30–60 seconds |
| Agentic (tool calls, multi-step) | 5–30 minutes |
| Large model inference (long context) | 10–30 minutes |
| Batch job submission | 60 seconds (results are async) |
Python SDK#
from openai import OpenAI
import httpx
client = OpenAI(
base_url="https://api.fireworks.ai/inference/v1",
api_key="<your-api-key>",
timeout=httpx.Timeout(
connect=10.0,
read=1800.0, # 30 min for long generations
write=30.0,
pool=10.0,
),
)Raw HTTP#
import requests
response = requests.post(
"https://api.fireworks.ai/inference/v1/chat/completions",
headers={"Authorization": "Bearer <your-api-key>"},
json={"model": "...", "messages": [...]},
timeout=(10, 1800), # (connect, read) in seconds
)Retry logic#
Which errors are retryable#
| Status | Meaning | Retry? |
|---|---|---|
429 | Rate limit | ✅ Yes — with backoff |
500 | Internal server error | ✅ Yes — transient |
502 | Bad gateway | ✅ Yes — transient |
503 | Service unavailable | ✅ Yes — with backoff |
504 | Gateway timeout | ✅ Yes — transient |
400 | Bad request | ❌ No — fix the request |
401 | Unauthorized | ❌ No — check API key |
404 | Not found | ❌ No — check model/deployment ID |
422 | Unprocessable entity | ❌ No — fix the request body |
Exponential backoff with jitter#
import time, random
from openai import OpenAI, RateLimitError, APIStatusError
def call_with_retry(client, max_retries=5, base_delay=1.0, **kwargs):
for attempt in range(max_retries):
try:
return client.chat.completions.create(**kwargs)
except RateLimitError:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
time.sleep(delay)
except APIStatusError as e:
if e.status_code in (500, 502, 503, 504):
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
time.sleep(delay)
else:
raiseOpenAI SDK built-in retry#
client = OpenAI(
base_url="https://api.fireworks.ai/inference/v1",
api_key="<your-api-key>",
max_retries=3,
)Handling 429 rate limits#
On serverless: Limits scale automatically with sustained usage. For immediate capacity, contact support or switch to a dedicated deployment.
On dedicated deployments: Increase concurrency by raising replica counts (for example with firectl deployment update and autoscaling settings). See Autoscaling.
Long-running training jobs#
For RL / RFT trainer jobs, use reconnect_and_wait on the job manager to recover from preemption or transient failures. See Trainer job manager for parameters and examples.
To preserve optimizer state across interruptions, set dcp_save_interval in your training config. See RFT parameters reference.
The analytics dashboard vs. client-side failures#
The Fireworks analytics and usage views count server-acknowledged requests. They do not capture connection errors that occur before a request reaches the server — those appear as failures on the client but may show as zero or reduced traffic in the console.
If your client shows failures but the dashboard looks clean, the issue is likely client-side: timeout before connection, DNS resolution failure, or network path problems.
Use Exporting metrics for per-deployment Prometheus metrics that reflect what Fireworks infrastructure observed for dedicated deployments.