Batch API ↗

Summary: Process large-scale async workloads

Original Documentation

Documentation Index#
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.

Process large-scale async workloads

Process large volumes of requests asynchronously at 50% lower cost. Batch API is ideal for:

Production-scale inference workloads
Large-scale testing and benchmarking
Training smaller models with larger ones (distillation guide)

Batch jobs automatically use prompt caching for additional 50% cost savings on cached tokens. Maximize cache hits by placing static content first in your prompts.

Model compatibility#

Not all models support the Batch API. Before submitting a batch job, verify your target model is batch-compatible:

firectl get model accounts/fireworks/models/<model-id>

Look for batch_inference_supported: true in the model metadata. If the field is absent or false, the model is not available for batch inference.

If a model does not support batch inference, submitting a job may not produce an immediate error — the job can remain in a pending state and never schedule. Always verify compatibility before submitting.

If your batch job is not scheduling:

Confirm the model supports batch inference (see above).
Validate your JSONL input — each line must be a complete, valid JSON object matching the request schema.
Check that your account has sufficient quota for batch jobs.
If the job has been pending for more than 30 minutes, contact support with your job ID.

Getting Started#

Datasets must be in JSONL format (one JSON object per line):

Requirements:

File format: JSONL (each line is a valid JSON object)
Size limit: Under 1GB
Required fields: custom_id (unique) and body (request parameters)

Example dataset:

    {"custom_id": "request-1", "body": {"messages": [{"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"}], "max_tokens": 100}}
    {"custom_id": "request-2", "body": {"messages": [{"role": "user", "content": "Explain quantum computing"}], "temperature": 0.7}}
    {"custom_id": "request-3", "body": {"messages": [{"role": "user", "content": "Tell me a joke"}]}}
    ```

Save as `batch_input_data.jsonl` locally.
  </Accordion>

  <Accordion title="2. Upload Your Dataset">
<span class="tab-group-start"></span>
  <span class="tab-start" data-tab-title="UI"></span>
    You can simply navigate to the dataset tab, click `Create Dataset` and follow the wizard.

    <img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/images/fine-tuning/dataset.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=406fa721650d41553f3adc5e4d372a68" alt="Dataset Upload" width="2972" height="2060" data-path="images/fine-tuning/dataset.png" />
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="firectl"></span>
    ```bash
        firectl dataset create batch-input-dataset ./batch_input_data.jsonl
        ```
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="HTTP API"></span>
    You need to make two separate HTTP requests. One for creating the dataset entry and one for uploading the dataset. Full reference here: [Create dataset](/api-reference/create-dataset).

    ```bash
        # Create Dataset Entry
        curl -X POST "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/datasets" \
          -H "Authorization: Bearer ${API_KEY}" \
          -H "Content-Type: application/json" \
          -d '{
            "datasetId": "batch-input-dataset",
            "dataset": { "userUploaded": {} }
          }'

        # Upload JSONL file
        curl -X POST "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/datasets/batch-input-dataset:upload" \
          -H "Authorization: Bearer ${API_KEY}" \
          -F "file=@./batch_input_data.jsonl"
        ```
  <span class="tab-end"></span>
<span class="tab-group-end"></span>
  </Accordion>

  <Accordion title="3. Create a Batch Job">
<span class="tab-group-start"></span>
  <span class="tab-start" data-tab-title="UI"></span>
    Navigate to the Batch Inference tab and click "Create Batch Inference Job". Select your input dataset:

    <img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/images/batch-inference/BIJ_Dataset_Select.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=d0578af8a6a76f48a2b82cec1d165189" alt="BIJ Dataset Select" width="3840" height="1982" data-path="images/batch-inference/BIJ_Dataset_Select.png" />

    Choose your model:

    <img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/images/batch-inference/BIJ_Model_Select.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=c92290674cfef02e9f916e740141b998" alt="BIJ Model Select" width="3840" height="1970" data-path="images/batch-inference/BIJ_Model_Select.png" />

    Configure optional settings:

    <img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/images/batch-inference/BIJ_Optional_Settings.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=eac5e45bbbebecebf77da16faa1b6faa" alt="BIJ Optional Settings" width="3840" height="1976" data-path="images/batch-inference/BIJ_Optional_Settings.png" />
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="firectl"></span>
    ```bash
        firectl batch-inference-job create \
          --model accounts/fireworks/models/llama-v3p1-8b-instruct \
          --input-dataset-id batch-input-dataset
        ```

    With additional parameters:

    ```bash
        firectl batch-inference-job create \
          --job-id my-batch-job \
          --model accounts/fireworks/models/llama-v3p1-8b-instruct \
          --input-dataset-id batch-input-dataset \
          --output-dataset-id batch-output-dataset \
          --max-tokens 1024 \
          --temperature 0.7 \
          --top-p 0.9
        ```
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="HTTP API"></span>
    ```bash
        curl -X POST "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/batchInferenceJobs?batchInferenceJobId=my-batch-job" \
          -H "Authorization: Bearer ${API_KEY}" \
          -H "Content-Type: application/json" \
          -d '{
            "model": "accounts/fireworks/models/llama-v3p1-8b-instruct",
            "inputDatasetId": "accounts/'${ACCOUNT_ID}'/datasets/batch-input-dataset",
            "outputDatasetId": "accounts/'${ACCOUNT_ID}'/datasets/batch-output-dataset",
            "inferenceParameters": {
              "maxTokens": 1024,
              "temperature": 0.7,
              "topP": 0.9
            }
          }'
        ```
  <span class="tab-end"></span>
<span class="tab-group-end"></span>
  </Accordion>

  <Accordion title="4. Monitor Your Job">
<span class="tab-group-start"></span>
  <span class="tab-start" data-tab-title="UI"></span>
    View all your batch inference jobs in the dashboard:

    <img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/images/batch-inference/BIJ_List.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=217e1564cf44569f21261a43c9f089d1" alt="BIJ List" width="3840" height="1986" data-path="images/batch-inference/BIJ_List.png" />
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="firectl"></span>
    ```bash
        # Get job status
        firectl batch-inference-job get my-batch-job

        # List all batch jobs
        firectl batch-inference-job list
        ```
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="HTTP API"></span>
    ```bash
        # Get specific job
        curl -X GET "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/batchInferenceJobs/my-batch-job" \
          -H "Authorization: Bearer ${API_KEY}"

        # List all jobs
        curl -X GET "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/batchInferenceJobs" \
          -H "Authorization: Bearer ${API_KEY}"
        ```
  <span class="tab-end"></span>
<span class="tab-group-end"></span>
  </Accordion>

  <Accordion title="5. Download Results">
<span class="tab-group-start"></span>
  <span class="tab-start" data-tab-title="UI"></span>
    Navigate to the output dataset and download the results:

    <img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/images/batch-inference/BIJ_Dataset_Download.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=32382ad5bfe1d2ce70f28c0ad73633f5" alt="BIJ Dataset Download" width="3840" height="1976" data-path="images/batch-inference/BIJ_Dataset_Download.png" />
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="firectl"></span>
    ```bash
        firectl dataset download batch-output-dataset
        ```
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="HTTP API"></span>
    ```bash
        # Get download endpoint and save response
        curl -s -X GET "https://api.fireworks.ai/v1/accounts/${ACCOUNT_ID}/datasets/batch-output-dataset:getDownloadEndpoint" \
          -H "Authorization: Bearer ${API_KEY}" \
          -d '{}' > download.json

        # Extract and download all files
        jq -r '.filenameToSignedUrls | to_entries[] | "\(.key) \(.value)"' download.json | \
        while read -r object_path signed_url; do
            fname=$(basename "$object_path")
            echo "Downloading → $fname"
            curl -L -o "$fname" "$signed_url"
        done
        ```
  <span class="tab-end"></span>
<span class="tab-group-end"></span>

<span class="callout-start" data-callout-type="tip"></span>
  The output dataset contains two files: a **results file** (successful responses in JSONL format) and an **error file** (failed requests with debugging info).
<span class="callout-end"></span>
  </Accordion>
</AccordionGroup>

## Reference

<AccordionGroup>
  <Accordion title="Job states">
Batch jobs progress through several states:

| State          | Description                                           |
| -------------- | ----------------------------------------------------- |
| **VALIDATING** | Dataset is being validated for format requirements    |
| **PENDING**    | Job is queued and waiting for resources               |
| **RUNNING**    | Actively processing requests                          |
| **COMPLETED**  | All requests successfully processed                   |
| **FAILED**     | Unrecoverable error occurred (check status message)   |
| **EXPIRED**    | Exceeded 24-hour limit (completed requests are saved) |
  </Accordion>

  <Accordion title="Supported models">
* **Base Models** – Any model in the [Model Library](https://fireworks.ai/models)
* **Custom Models** – Your uploaded or fine-tuned models

*Note: Newly added models may have a delay before being supported. See [Quantization](/models/quantization) for precision info.*
  </Accordion>

  <Accordion title="Limits and constraints">
* **Per-request limits:** Same as [Chat Completion API limits](/api-reference/post-chatcompletions)
* **Input dataset:** Max 1GB
* **Output dataset:** Max 8GB (job may expire early if reached)
* **Job timeout:** 24 hours maximum
  </Accordion>

  <Accordion title="Handling expired jobs">
Jobs expire after 24 hours. Completed rows are billed and saved to the output dataset.

**Resume processing:**

```bash
    firectl batch-inference-job create \
      --continue-from original-job-id \
      --model accounts/fireworks/models/llama-v3p1-8b-instruct \
      --output-dataset-id new-output-dataset
    ```

This processes only unfinished/failed requests from the original job.

**Download complete lineage:**

```bash
    firectl dataset download output-dataset-id --download-lineage
    ```

Downloads all datasets in the continuation chain.
  </Accordion>

  <Accordion title="Best practices">
* **Validate thoroughly:** Check dataset format before uploading
* **Descriptive IDs:** Use meaningful `custom_id` values for tracking
* **Optimize tokens:** Set reasonable `max_tokens` limits
* **Monitor progress:** Track long-running jobs regularly
* **Cache optimization:** Place static content first in prompts
  </Accordion>
</AccordionGroup>

## Next Steps

<span class="card-group-start" data-cols="3"></span>
  <span class="card-start" data-card-title="Prompt Caching" data-card-icon="bolt" data-card-href="/guides/prompt-caching"></span>
Maximize cost savings with automatic prompt caching
  <span class="card-end"></span>

  <span class="card-start" data-card-title="Fine-Tuning" data-card-icon="sparkles" data-card-href="/fine-tuning/finetuning-intro"></span>
Create custom models for your batch workloads
  <span class="card-end"></span>

  <span class="card-start" data-card-title="API Reference" data-card-icon="code" data-card-href="/api-reference/create-batch-inference-job"></span>
Full API documentation for Batch API
  <span class="card-end"></span>
<span class="card-group-end"></span>

Link last verified June 7, 2026. View original ↗

Source: Fireworks AI Docs

Link last verified: 2026-06-07