Direct Preference Optimization

no

Original Documentation

Documentation Index#

Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.

Direct Preference Optimization (DPO) fine-tunes models by training them on pairs of preferred and non-preferred responses to the same prompt. This teaches the model to generate more desirable outputs while reducing unwanted behaviors.

Use DPO when:

  • Aligning model outputs with brand voice, tone, or style guidelines
  • Reducing hallucinations or incorrect reasoning patterns
  • Improving response quality where there’s no single “correct” answer
  • Teaching models to follow specific formatting or structural preferences

Fine-tuning with DPO#

Datasets must adhere strictly to the JSONL format, where each line represents a complete JSON-formatted training example.

Minimum Requirements:

  • Minimum examples needed: 3
  • Maximum examples: Up to 3 million examples per dataset
  • File format: JSONL (each line is a valid JSON object)
  • Dataset Schema: Each training sample must include the following fields:
    • An input field containing a messages array, where each message is an object with two fields:
      • role: one of system, user, or assistant
      • content: a string representing the message content
    • A preferred_output field containing an assistant message with an ideal response
    • A non_preferred_output field containing an assistant message with a suboptimal response

Here’s an example conversation dataset (one training example):

    {
      "input": {
        "messages": [
          {
            "role": "user",
            "content": "What is Einstein famous for?"
          }
        ],
        "tools": []
      },
      "preferred_output": [
        {
          "role": "assistant",
          "content": "Einstein is renowned for his theory of relativity, especially the equation E=mc²."
        }
      ],
      "non_preferred_output": [
        {
          "role": "assistant",
          "content": "He was a famous scientist."
        }
      ]
    }
    ```

<span class="callout-start" data-callout-type="warning"></span>
  We currently only support one-turn conversations for each example, where the preferred and non-preferred messages need to be the last assistant message.
<span class="callout-end"></span>

Save this dataset as jsonl file locally, for example `einstein_dpo.jsonl`.
  <span class="step-end"></span>

  <span class="step-marker" data-step-title="Create and upload the dataset"></span>
There are a couple ways to upload the dataset to Fireworks platform for fine tuning: `firectl`, `Restful API` , `builder SDK` or `UI`.

<span class="tab-group-start"></span>
  <span class="tab-start" data-tab-title="UI"></span>
    * You can simply navigate to the dataset tab, click `Create Dataset` and follow the wizard.

              <img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/images/fine-tuning/dataset.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=406fa721650d41553f3adc5e4d372a68" alt="Dataset Pn" width="2972" height="2060" data-path="images/fine-tuning/dataset.png" />
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="firectl"></span>
    * Upload dataset using `firectl`

    ```bash
        firectl dataset create <dataset-id> /path/to/file.jsonl
        ```
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="Restful API"></span>
    You need to make two separate HTTP requests. One for creating the dataset entry and one for uploading the dataset. Full reference here: [Create dataset](/api-reference/create-dataset). Note that the `exampleCount` parameter needs to be provided by the client.

    ```jsx
        // Create Dataset Entry
        const createDatasetPayload = {
          datasetId: "trader-poe-sample-data",
          dataset: { userUploaded: {} }
          // Additional params such as exampleCount
        };
        const urlCreateDataset = `${BASE_URL}/datasets`;
        const response = await fetch(urlCreateDataset, {
          method: "POST",
          headers: HEADERS_WITH_CONTENT_TYPE,
          body: JSON.stringify(createDatasetPayload)
        });
        ```

    ```jsx
        // Upload JSONL file
        const urlUpload = `${BASE_URL}/datasets/${DATASET_ID}:upload`;
        const files = new FormData();
        files.append("file", localFileInput.files[0]);

        const uploadResponse = await fetch(urlUpload, {
          method: "POST",
          headers: HEADERS,
          body: files
        });
        ```
  <span class="tab-end"></span>
<span class="tab-group-end"></span>

While all of the above approaches should work, `UI` is more suitable for smaller datasets `< 500MB` while `firectl` might work better for bigger datasets.

Ensure the dataset ID conforms to the [resource id restrictions](/getting-started/concepts#resource-names-and-ids).
  <span class="step-end"></span>

  <span class="step-marker" data-step-title="Create a DPO Job"></span>
<span class="tab-group-start"></span>
  <span class="tab-start" data-tab-title="firectl"></span>
    ```bash
        firectl dpoj create \
          --base-model accounts/account-id/models/base-model-id \
          --dataset accounts/my-account-id/datasets/my-dataset-id \
          --output-model new-model-id
        ```

    For our example, we might run the following command:

    ```bash
        firectl dpoj create \
          --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
          --dataset accounts/pyroworks/datasets/einstein-dpo \
          --output-model einstein-dpo-model
        ```

    to fine-tune a [Llama 3.1 8b Instruct](https://fireworks.ai/models/fireworks/llama-v3p1-8b-instruct) model with our Einstein dataset.
  <span class="tab-end"></span>
<span class="tab-group-end"></span>
  <span class="step-end"></span>

  <span class="step-marker" data-step-title="Monitor the DPO Job"></span>
<span class="tab-group-start"></span>
  <span class="tab-start" data-tab-title="firectl"></span>
    ```bash
        firectl dpoj get dpo-job-id
        ```
  <span class="tab-end"></span>
<span class="tab-group-end"></span>

Once the job is complete, the `STATE` will be set to `JOB_STATE_COMPLETED`, and the fine-tuned model can be deployed.
  <span class="step-end"></span>

  <span class="step-marker" data-step-title="Deploy the DPO fine-tuned model"></span>
Once training completes, you can create a deployment to interact with the fine-tuned model. Refer to [deploying a fine-tuned model](/fine-tuning/fine-tuning-models#deploying-a-fine-tuned-model) for more details.
  <span class="step-end"></span>
<span class="steps-end"></span>

## Next Steps

Explore other fine-tuning methods to improve model output for different use cases.

<span class="card-group-start" data-cols="3"></span>
  <span class="card-start" data-card-title="Supervised Fine Tuning - Text" data-card-icon="message" data-card-href="/fine-tuning/fine-tuning-models"></span>
Train models on input-output examples to improve task-specific performance.
  <span class="card-end"></span>

  <span class="card-start" data-card-title="Reinforcement Fine Tuning" data-card-icon="brain" data-card-href="/fine-tuning/reinforcement-fine-tuning-models"></span>
Optimize models using AI feedback for complex reasoning and decision-making.
  <span class="card-end"></span>

  <span class="card-start" data-card-title="Supervised Fine Tuning - Vision" data-card-icon="eye" data-card-href="/fine-tuning/fine-tuning-vlm"></span>
Fine-tune vision-language models to understand both images and text.
  <span class="card-end"></span>
<span class="card-group-end"></span>
Link last verified June 7, 2026. View original ↗
Source: Fireworks AI Docs
Link last verified: 2026-06-07