Batch API ↗
noOriginal Documentation
Documentation Index#
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.
Process large-scale async workloads
Process large volumes of requests asynchronously at 50% lower cost. Batch API is ideal for:
- Production-scale inference workloads
- Large-scale testing and benchmarking
- Training smaller models with larger ones (distillation guide)
Batch jobs automatically use prompt caching for additional 50% cost savings on cached tokens. Maximize cache hits by placing static content first in your prompts.
Model compatibility#
Not all models support the Batch API. Before submitting a batch job, verify your target model is batch-compatible:
firectl get model accounts/fireworks/models/<model-id>Look for batch_inference_supported: true in the model metadata. If the field is absent or false, the model is not available for batch inference.
If a model does not support batch inference, submitting a job may not produce an immediate error — the job can remain in a pending state and never schedule. Always verify compatibility before submitting.
If your batch job is not scheduling:
- Confirm the model supports batch inference (see above).
- Validate your JSONL input — each line must be a complete, valid JSON object matching the request schema.
- Check that your account has sufficient quota for batch jobs.
- If the job has been pending for more than 30 minutes, contact support with your job ID.
Getting Started#
Requirements:
- File format: JSONL (each line is a valid JSON object)
- Size limit: Under 1GB
- Required fields:
custom_id(unique) andbody(request parameters)
Example dataset:
{"custom_id": "request-1", "body": {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}
{"custom_id": "request-2", "body": {"messages": [{"role": "user", "content": "Explain quantum computing"}], "temperature": 0.7}}
{"custom_id": "request-3", "body": {"messages": [{"role": "user", "content": "Tell me a joke"}]}}
```
Save as `batch_input_data.jsonl` locally.
</Accordion>
<Accordion title="2. Upload Your Dataset">
<span class="tab-group-start"></span>
<span class="tab-start" data-tab-title="UI"></span>
You can simply navigate to the dataset tab, click `Create Dataset` and follow the wizard.
<img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/images/fine-tuning/dataset.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=406fa721650d41553f3adc5e4d372a68" alt="Dataset Upload" width="2972" height="2060" data-path="images/fine-tuning/dataset.png" />
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="firectl"></span>
```bash
firectl dataset create batch-input-dataset ./batch_input_data.jsonl
```
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="HTTP API"></span>
You need to make two separate HTTP requests. One for creating the dataset entry and one for uploading the dataset. Full reference here: [Create dataset](/api-reference/create-dataset).
```bash
# Create Dataset Entry
curl -X POST "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/datasets" \
-H "Authorization: Bearer ${API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"datasetId": "batch-input-dataset",
"dataset": { "userUploaded": {} }
}'
# Upload JSONL file
curl -X POST "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/datasets/batch-input-dataset:upload" \
-H "Authorization: Bearer ${API_KEY}" \
-F "file=@./batch_input_data.jsonl"
```
<span class="tab-end"></span>
<span class="tab-group-end"></span>
</Accordion>
<Accordion title="3. Create a Batch Job">
<span class="tab-group-start"></span>
<span class="tab-start" data-tab-title="UI"></span>
Navigate to the Batch Inference tab and click "Create Batch Inference Job". Select your input dataset:
<img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/images/batch-inference/BIJ_Dataset_Select.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=d0578af8a6a76f48a2b82cec1d165189" alt="BIJ Dataset Select" width="3840" height="1982" data-path="images/batch-inference/BIJ_Dataset_Select.png" />
Choose your model:
<img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/images/batch-inference/BIJ_Model_Select.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=c92290674cfef02e9f916e740141b998" alt="BIJ Model Select" width="3840" height="1970" data-path="images/batch-inference/BIJ_Model_Select.png" />
Configure optional settings:
<img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/images/batch-inference/BIJ_Optional_Settings.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=eac5e45bbbebecebf77da16faa1b6faa" alt="BIJ Optional Settings" width="3840" height="1976" data-path="images/batch-inference/BIJ_Optional_Settings.png" />
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="firectl"></span>
```bash
firectl batch-inference-job create \
--model accounts/fireworks/models/llama-v3p1-8b-instruct \
--input-dataset-id batch-input-dataset
```
With additional parameters:
```bash
firectl batch-inference-job create \
--job-id my-batch-job \
--model accounts/fireworks/models/llama-v3p1-8b-instruct \
--input-dataset-id batch-input-dataset \
--output-dataset-id batch-output-dataset \
--max-tokens 1024 \
--temperature 0.7 \
--top-p 0.9
```
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="HTTP API"></span>
```bash
curl -X POST "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/batchInferenceJobs?batchInferenceJobId=my-batch-job" \
-H "Authorization: Bearer ${API_KEY}" \
-H "Content-Type: application/json" \
-d '{
"model": "accounts/fireworks/models/llama-v3p1-8b-instruct",
"inputDatasetId": "accounts/'${ACCOUNT_ID}'/datasets/batch-input-dataset",
"outputDatasetId": "accounts/'${ACCOUNT_ID}'/datasets/batch-output-dataset",
"inferenceParameters": {
"maxTokens": 1024,
"temperature": 0.7,
"topP": 0.9
}
}'
```
<span class="tab-end"></span>
<span class="tab-group-end"></span>
</Accordion>
<Accordion title="4. Monitor Your Job">
<span class="tab-group-start"></span>
<span class="tab-start" data-tab-title="UI"></span>
View all your batch inference jobs in the dashboard:
<img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/images/batch-inference/BIJ_List.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=217e1564cf44569f21261a43c9f089d1" alt="BIJ List" width="3840" height="1986" data-path="images/batch-inference/BIJ_List.png" />
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="firectl"></span>
```bash
# Get job status
firectl batch-inference-job get my-batch-job
# List all batch jobs
firectl batch-inference-job list
```
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="HTTP API"></span>
```bash
# Get specific job
curl -X GET "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/batchInferenceJobs/my-batch-job" \
-H "Authorization: Bearer ${API_KEY}"
# List all jobs
curl -X GET "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/batchInferenceJobs" \
-H "Authorization: Bearer ${API_KEY}"
```
<span class="tab-end"></span>
<span class="tab-group-end"></span>
</Accordion>
<Accordion title="5. Download Results">
<span class="tab-group-start"></span>
<span class="tab-start" data-tab-title="UI"></span>
Navigate to the output dataset and download the results:
<img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/images/batch-inference/BIJ_Dataset_Download.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=32382ad5bfe1d2ce70f28c0ad73633f5" alt="BIJ Dataset Download" width="3840" height="1976" data-path="images/batch-inference/BIJ_Dataset_Download.png" />
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="firectl"></span>
```bash
firectl dataset download batch-output-dataset
```
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="HTTP API"></span>
```bash
# Get download endpoint and save response
curl -s -X GET "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/datasets/batch-output-dataset:getDownloadEndpoint" \
-H "Authorization: Bearer ${API_KEY}" \
-d '{}' > download.json
# Extract and download all files
jq -r '.filenameToSignedUrls | to_entries[] | "\(.key) \(.value)"' download.json | \
while read -r object_path signed_url; do
fname=$(basename "$object_path")
echo "Downloading → $fname"
curl -L -o "$fname" "$signed_url"
done
```
<span class="tab-end"></span>
<span class="tab-group-end"></span>
<span class="callout-start" data-callout-type="tip"></span>
The output dataset contains two files: a **results file** (successful responses in JSONL format) and an **error file** (failed requests with debugging info).
<span class="callout-end"></span>
</Accordion>
</AccordionGroup>
## Reference
<AccordionGroup>
<Accordion title="Job states">
Batch jobs progress through several states:
| State | Description |
| -------------- | ----------------------------------------------------- |
| **VALIDATING** | Dataset is being validated for format requirements |
| **PENDING** | Job is queued and waiting for resources |
| **RUNNING** | Actively processing requests |
| **COMPLETED** | All requests successfully processed |
| **FAILED** | Unrecoverable error occurred (check status message) |
| **EXPIRED** | Exceeded 24-hour limit (completed requests are saved) |
</Accordion>
<Accordion title="Supported models">
* **Base Models** – Any model in the [Model Library](https://fireworks.ai/models)
* **Custom Models** – Your uploaded or fine-tuned models
*Note: Newly added models may have a delay before being supported. See [Quantization](/models/quantization) for precision info.*
</Accordion>
<Accordion title="Limits and constraints">
* **Per-request limits:** Same as [Chat Completion API limits](/api-reference/post-chatcompletions)
* **Input dataset:** Max 1GB
* **Output dataset:** Max 8GB (job may expire early if reached)
* **Job timeout:** 24 hours maximum
</Accordion>
<Accordion title="Handling expired jobs">
Jobs expire after 24 hours. Completed rows are billed and saved to the output dataset.
**Resume processing:**
```bash
firectl batch-inference-job create \
--continue-from original-job-id \
--model accounts/fireworks/models/llama-v3p1-8b-instruct \
--output-dataset-id new-output-dataset
```
This processes only unfinished/failed requests from the original job.
**Download complete lineage:**
```bash
firectl dataset download output-dataset-id --download-lineage
```
Downloads all datasets in the continuation chain.
</Accordion>
<Accordion title="Best practices">
* **Validate thoroughly:** Check dataset format before uploading
* **Descriptive IDs:** Use meaningful `custom_id` values for tracking
* **Optimize tokens:** Set reasonable `max_tokens` limits
* **Monitor progress:** Track long-running jobs regularly
* **Cache optimization:** Place static content first in prompts
</Accordion>
</AccordionGroup>
## Next Steps
<span class="card-group-start" data-cols="3"></span>
<span class="card-start" data-card-title="Prompt Caching" data-card-icon="bolt" data-card-href="/guides/prompt-caching"></span>
Maximize cost savings with automatic prompt caching
<span class="card-end"></span>
<span class="card-start" data-card-title="Fine-Tuning" data-card-icon="sparkles" data-card-href="/fine-tuning/finetuning-intro"></span>
Create custom models for your batch workloads
<span class="card-end"></span>
<span class="card-start" data-card-title="API Reference" data-card-icon="code" data-card-href="/api-reference/create-batch-inference-job"></span>
Full API documentation for Batch API
<span class="card-end"></span>
<span class="card-group-end"></span>