Direct Preference Optimization ↗

fireworks reference advanced fine-tuning

Original Documentation

Documentation Index#
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.

Direct Preference Optimization (DPO) fine-tunes models by training them on pairs of preferred and non-preferred responses to the same prompt. This teaches the model to generate more desirable outputs while reducing unwanted behaviors.

Use DPO when:

Aligning model outputs with brand voice, tone, or style guidelines
Reducing hallucinations or incorrect reasoning patterns
Improving response quality where there’s no single “correct” answer
Teaching models to follow specific formatting or structural preferences

Fine-tuning with DPO#

Datasets must adhere strictly to the JSONL format, where each line represents a complete JSON-formatted training example.

Minimum Requirements:

Minimum examples needed: 3
Maximum examples: Up to 3 million examples per dataset
File format: JSONL (each line is a valid JSON object)
Dataset Schema: Each training sample must include the following fields:
- An input field containing a messages array, where each message is an object with two fields:
  - role: one of system, user, or assistant
  - content: a string representing the message content
- A preferred_output field containing an assistant message with an ideal response
- A non_preferred_output field containing an assistant message with a suboptimal response

Here’s an example conversation dataset (one training example):

    {
      "input": {
        "messages": [
          {
            "role": "user",
            "content": "What is Einstein famous for?"
          }
        ],
        "tools": []
      },
      "preferred_output": [
        {
          "role": "assistant",
          "content": "Einstein is renowned for his theory of relativity, especially the equation E=mc²."
        }
      ],
      "non_preferred_output": [
        {
          "role": "assistant",
          "content": "He was a famous scientist."
        }
      ]
    }
    ```

<span class="callout-start" data-callout-type="warning"></span>
  We currently only support one-turn conversations for each example, where the preferred and non-preferred messages need to be the last assistant message.
<span class="callout-end"></span>

Save this dataset as jsonl file locally, for example `einstein_dpo.jsonl`.
  <span class="step-end"></span>

  <span class="step-marker" data-step-title="Create and upload the dataset"></span>
There are a couple ways to upload the dataset to Fireworks platform for fine tuning: `firectl`, `Restful API` , `builder SDK` or `UI`.

<span class="tab-group-start"></span>
  <span class="tab-start" data-tab-title="UI"></span>
    * You can simply navigate to the dataset tab, click `Create Dataset` and follow the wizard.

              <img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/images/fine-tuning/dataset.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=406fa721650d41553f3adc5e4d372a68" alt="Dataset Pn" width="2972" height="2060" data-path="images/fine-tuning/dataset.png" />
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="firectl"></span>
    * Upload dataset using `firectl`

    ```bash
        firectl dataset create <dataset-id> /path/to/file.jsonl
        ```
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="Restful API"></span>
    You need to make two separate HTTP requests. One for creating the dataset entry and one for uploading the dataset. Full reference here: [Create dataset](/api-reference/create-dataset). Note that the `exampleCount` parameter needs to be provided by the client.

    ```jsx
        // Create Dataset Entry
        const createDatasetPayload = {
          datasetId: "trader-poe-sample-data",
          dataset: { userUploaded: {} }
          // Additional params such as exampleCount
        };
        const urlCreateDataset = `${BASE_URL}/datasets`;
        const response = await fetch(urlCreateDataset, {
          method: "POST",
          headers: HEADERS_WITH_CONTENT_TYPE,
          body: JSON.stringify(createDatasetPayload)
        });
        ```

    ```jsx
        // Upload JSONL file
        const urlUpload = `${BASE_URL}/datasets/${DATASET_ID}:upload`;
        const files = new FormData();
        files.append("file", localFileInput.files[0]);

        const uploadResponse = await fetch(urlUpload, {
          method: "POST",
          headers: HEADERS,
          body: files
        });
        ```
  <span class="tab-end"></span>
<span class="tab-group-end"></span>

While all of the above approaches should work, `UI` is more suitable for smaller datasets `< 500MB` while `firectl` might work better for bigger datasets.

Ensure the dataset ID conforms to the [resource id restrictions](/getting-started/concepts#resource-names-and-ids).
  <span class="step-end"></span>

  <span class="step-marker" data-step-title="Create a DPO Job"></span>
<span class="tab-group-start"></span>
  <span class="tab-start" data-tab-title="firectl"></span>
    ```bash
        firectl dpoj create \
          --base-model accounts/account-id/models/base-model-id \
          --dataset accounts/my-account-id/datasets/my-dataset-id \
          --output-model new-model-id
        ```

    For our example, we might run the following command:

    ```bash
        firectl dpoj create \
          --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
          --dataset accounts/pyroworks/datasets/einstein-dpo \
          --output-model einstein-dpo-model
        ```

    to fine-tune a [Llama 3.1 8b Instruct](https://fireworks.ai/models/fireworks/llama-v3p1-8b-instruct) model with our Einstein dataset.
  <span class="tab-end"></span>
<span class="tab-group-end"></span>
  <span class="step-end"></span>

  <span class="step-marker" data-step-title="Monitor the DPO Job"></span>
<span class="tab-group-start"></span>
  <span class="tab-start" data-tab-title="firectl"></span>
    ```bash
        firectl dpoj get dpo-job-id
        ```
  <span class="tab-end"></span>
<span class="tab-group-end"></span>

Once the job is complete, the `STATE` will be set to `JOB_STATE_COMPLETED`, and the fine-tuned model can be deployed.
  <span class="step-end"></span>

  <span class="step-marker" data-step-title="Deploy the DPO fine-tuned model"></span>
Once training completes, you can create a deployment to interact with the fine-tuned model. Refer to [deploying a fine-tuned model](/fine-tuning/fine-tuning-models#deploying-a-fine-tuned-model) for more details.
  <span class="step-end"></span>
<span class="steps-end"></span>

## Next Steps

Explore other fine-tuning methods to improve model output for different use cases.

<span class="card-group-start" data-cols="3"></span>
  <span class="card-start" data-card-title="Supervised Fine Tuning - Text" data-card-icon="message" data-card-href="/fine-tuning/fine-tuning-models"></span>
Train models on input-output examples to improve task-specific performance.
  <span class="card-end"></span>

  <span class="card-start" data-card-title="Reinforcement Fine Tuning" data-card-icon="brain" data-card-href="/fine-tuning/reinforcement-fine-tuning-models"></span>
Optimize models using AI feedback for complex reasoning and decision-making.
  <span class="card-end"></span>

  <span class="card-start" data-card-title="Supervised Fine Tuning - Vision" data-card-icon="eye" data-card-href="/fine-tuning/fine-tuning-vlm"></span>
Fine-tune vision-language models to understand both images and text.
  <span class="card-end"></span>
<span class="card-group-end"></span>

Link last verified June 7, 2026. View original ↗

Source: Fireworks AI Docs

Link last verified: 2026-06-07