Direct Preference Optimization ↗
noOriginal Documentation
Documentation Index#
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.
Direct Preference Optimization (DPO) fine-tunes models by training them on pairs of preferred and non-preferred responses to the same prompt. This teaches the model to generate more desirable outputs while reducing unwanted behaviors.
Use DPO when:
- Aligning model outputs with brand voice, tone, or style guidelines
- Reducing hallucinations or incorrect reasoning patterns
- Improving response quality where there’s no single “correct” answer
- Teaching models to follow specific formatting or structural preferences
Fine-tuning with DPO#
Datasets must adhere strictly to the JSONL format, where each line represents a complete JSON-formatted training example.
Minimum Requirements:
- Minimum examples needed: 3
- Maximum examples: Up to 3 million examples per dataset
- File format: JSONL (each line is a valid JSON object)
- Dataset Schema: Each training sample must include the following fields:
- An
inputfield containing amessagesarray, where each message is an object with two fields:role: one ofsystem,user, orassistantcontent: a string representing the message content
- A
preferred_outputfield containing an assistant message with an ideal response - A
non_preferred_outputfield containing an assistant message with a suboptimal response
- An
Here’s an example conversation dataset (one training example):
{
"input": {
"messages": [
{
"role": "user",
"content": "What is Einstein famous for?"
}
],
"tools": []
},
"preferred_output": [
{
"role": "assistant",
"content": "Einstein is renowned for his theory of relativity, especially the equation E=mc²."
}
],
"non_preferred_output": [
{
"role": "assistant",
"content": "He was a famous scientist."
}
]
}
```
<span class="callout-start" data-callout-type="warning"></span>
We currently only support one-turn conversations for each example, where the preferred and non-preferred messages need to be the last assistant message.
<span class="callout-end"></span>
Save this dataset as jsonl file locally, for example `einstein_dpo.jsonl`.
<span class="step-end"></span>
<span class="step-marker" data-step-title="Create and upload the dataset"></span>
There are a couple ways to upload the dataset to Fireworks platform for fine tuning: `firectl`, `Restful API` , `builder SDK` or `UI`.
<span class="tab-group-start"></span>
<span class="tab-start" data-tab-title="UI"></span>
* You can simply navigate to the dataset tab, click `Create Dataset` and follow the wizard.
<img src="https://mintcdn.com/fireworksai/XAK4ji8XrlzPoITj/images/fine-tuning/dataset.png?fit=max&auto=format&n=XAK4ji8XrlzPoITj&q=85&s=406fa721650d41553f3adc5e4d372a68" alt="Dataset Pn" width="2972" height="2060" data-path="images/fine-tuning/dataset.png" />
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="firectl"></span>
* Upload dataset using `firectl`
```bash
firectl dataset create <dataset-id> /path/to/file.jsonl
```
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="Restful API"></span>
You need to make two separate HTTP requests. One for creating the dataset entry and one for uploading the dataset. Full reference here: [Create dataset](/api-reference/create-dataset). Note that the `exampleCount` parameter needs to be provided by the client.
```jsx
// Create Dataset Entry
const createDatasetPayload = {
datasetId: "trader-poe-sample-data",
dataset: { userUploaded: {} }
// Additional params such as exampleCount
};
const urlCreateDataset = `${BASE_URL}/datasets`;
const response = await fetch(urlCreateDataset, {
method: "POST",
headers: HEADERS_WITH_CONTENT_TYPE,
body: JSON.stringify(createDatasetPayload)
});
```
```jsx
// Upload JSONL file
const urlUpload = `${BASE_URL}/datasets/${DATASET_ID}:upload`;
const files = new FormData();
files.append("file", localFileInput.files[0]);
const uploadResponse = await fetch(urlUpload, {
method: "POST",
headers: HEADERS,
body: files
});
```
<span class="tab-end"></span>
<span class="tab-group-end"></span>
While all of the above approaches should work, `UI` is more suitable for smaller datasets `< 500MB` while `firectl` might work better for bigger datasets.
Ensure the dataset ID conforms to the [resource id restrictions](/getting-started/concepts#resource-names-and-ids).
<span class="step-end"></span>
<span class="step-marker" data-step-title="Create a DPO Job"></span>
<span class="tab-group-start"></span>
<span class="tab-start" data-tab-title="firectl"></span>
```bash
firectl dpoj create \
--base-model accounts/account-id/models/base-model-id \
--dataset accounts/my-account-id/datasets/my-dataset-id \
--output-model new-model-id
```
For our example, we might run the following command:
```bash
firectl dpoj create \
--base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
--dataset accounts/pyroworks/datasets/einstein-dpo \
--output-model einstein-dpo-model
```
to fine-tune a [Llama 3.1 8b Instruct](https://fireworks.ai/models/fireworks/llama-v3p1-8b-instruct) model with our Einstein dataset.
<span class="tab-end"></span>
<span class="tab-group-end"></span>
<span class="step-end"></span>
<span class="step-marker" data-step-title="Monitor the DPO Job"></span>
<span class="tab-group-start"></span>
<span class="tab-start" data-tab-title="firectl"></span>
```bash
firectl dpoj get dpo-job-id
```
<span class="tab-end"></span>
<span class="tab-group-end"></span>
Once the job is complete, the `STATE` will be set to `JOB_STATE_COMPLETED`, and the fine-tuned model can be deployed.
<span class="step-end"></span>
<span class="step-marker" data-step-title="Deploy the DPO fine-tuned model"></span>
Once training completes, you can create a deployment to interact with the fine-tuned model. Refer to [deploying a fine-tuned model](/fine-tuning/fine-tuning-models#deploying-a-fine-tuned-model) for more details.
<span class="step-end"></span>
<span class="steps-end"></span>
## Next Steps
Explore other fine-tuning methods to improve model output for different use cases.
<span class="card-group-start" data-cols="3"></span>
<span class="card-start" data-card-title="Supervised Fine Tuning - Text" data-card-icon="message" data-card-href="/fine-tuning/fine-tuning-models"></span>
Train models on input-output examples to improve task-specific performance.
<span class="card-end"></span>
<span class="card-start" data-card-title="Reinforcement Fine Tuning" data-card-icon="brain" data-card-href="/fine-tuning/reinforcement-fine-tuning-models"></span>
Optimize models using AI feedback for complex reasoning and decision-making.
<span class="card-end"></span>
<span class="card-start" data-card-title="Supervised Fine Tuning - Vision" data-card-icon="eye" data-card-href="/fine-tuning/fine-tuning-vlm"></span>
Fine-tune vision-language models to understand both images and text.
<span class="card-end"></span>
<span class="card-group-end"></span>Link last verified
June 7, 2026.
View original ↗
Source: Fireworks AI Docs
Link last verified: 2026-06-07