Training Overview

no
Summary: Launch RFT jobs using the eval-protocol CLI

Original Documentation

Documentation Index#

Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.

Launch RFT jobs using the eval-protocol CLI

Reinforcement Fine-Tuning (RFT) is free for models under 16B parameters. When creating an RFT job in the UI, filter for free tuning models in the model selection area on the fine-tuning creation page. If kicking off jobs from the terminal, you can find the model ID from the Model Library. Note: SFT and DPO jobs are billed per training token for all model sizes—see the pricing page for details.

The Eval Protocol CLI provides the fastest, most reproducible way to launch RFT jobs. This page covers everything you need to know about using eval-protocol create rft.

Before launching, review Training Prerequisites & Validation for requirements, validation checks, and common errors.

Already familiar with firectl? Use it as an alternative to eval-protocol.

Installation and setup#

The following guide will help you:

  • Upload your evaluator to Fireworks. If you don’t have one yet, see Concepts > Evaluators
  • Upload your dataset to Fireworks
  • Create and launch the RFT job

    pip install eval-protocol
    ```

Verify installation:

```bash
    eval-protocol --version
    ```
  <span class="step-end"></span>

  <span class="step-marker" data-step-title="Set up authentication"></span>
Configure your Fireworks API key:

```bash
    export FIREWORKS_API_KEY="fw_your_api_key_here"
    ```

Or create a `.env` file:

```bash
    FIREWORKS_API_KEY=fw_your_api_key_here
    ```
  <span class="step-end"></span>

  <span class="step-marker" data-step-title="Test your evaluator locally"></span>
Before training, verify your evaluator works. This command discovers and runs your `@evaluation_test` with pytest. If a Dockerfile is present, it builds an image and runs the test in Docker; otherwise it runs on your host.

```bash
    cd evaluator_directory
    ep local-test
    ```

<span class="callout-start" data-callout-type="note"></span>
  If using a Dockerfile, it must use a Debian-based image (no Alpine or CentOS), be single-stage (no multi-stage builds), and only use supported instructions: `FROM`, `RUN`, `COPY`, `ADD`, `WORKDIR`, `USER`, `ENV`, `CMD`, `ENTRYPOINT`, `ARG`. Instructions like `EXPOSE` and `VOLUME` are ignored. See the [RFT quickstart guide](/fine-tuning/quickstart-svg-agent) for details.
<span class="callout-end"></span>
  <span class="step-end"></span>

  <span class="step-marker" data-step-title="Create the RFT job"></span>
From the directory where your evaluator and dataset (dataset.jsonl) are located,

```bash
    eval-protocol create rft \
      --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
      --output-model my-model-name 
    ```

The CLI will:

* Upload evaluator code (if changed)
* Upload dataset (if changed)
* Create the RFT job
* Display dashboard links for monitoring

Expected output:
Created Reinforcement Fine-tuning Job
   name: accounts/your-account/reinforcementFineTuningJobs/abc123

Dashboard Links:
   Evaluator: https://app.fireworks.ai/dashboard/evaluators/your-evaluator
   Dataset:   https://app.fireworks.ai/dashboard/datasets/your-dataset
   RFT Job:   https://app.fireworks.ai/dashboard/fine-tuning/reinforcement/abc123
```

Click the RFT Job link to watch training progress in real-time. See Monitor Training for details.

Common CLI options#

Customize your RFT job with these flags:

Model and output:

--base-model accounts/fireworks/models/llama-v3p1-8b-instruct  # Base model to fine-tune
--output-model my-custom-name                                   # Name for fine-tuned model

Training parameters:

--epochs 2                    # Number of training epochs (default: 1)
--learning-rate 5e-5          # Learning rate (default: 1e-4)
--lora-rank 16                # LoRA rank (default: 8)
--batch-size 65536            # Batch size in tokens (default: 32768)
--chunk-size 200              # Prompts rolled out per GRPO training step (default: 200). -1 disables chunking.

Loss method:

--rl-loss-method dapo           # RL loss method: grpo (default), dapo, gspo-token
--rl-kl-beta 0.001              # KL beta override (only for grpo; rejected for dapo/gspo-token)

Rollout (sampling) parameters:

--temperature 0.8               # Sampling temperature (default: 0.7)
--n 8                           # Number of rollouts per prompt (default: 4)
--response-candidates-count 8   # Alias for --n in firectl (default: 8, minimum: 2)
--max-tokens 4096               # Max tokens per response (default: 32768)
--top-p 0.95                    # Top-p sampling (default: 1.0)
--top-k 50                      # Top-k sampling (default: 40)
--max-concurrent-rollouts 64    # Max in-flight rollouts per job (default: 96, or the value set in @evaluation_test). Throughput only; no training effect.

Remote environments:

--remote-server-url https://your-evaluator.example.com  # For remote rollout processing

Force re-upload:

--force                       # Re-upload evaluator even if unchanged

See all options:

eval-protocol create rft --help

Advanced options#

Track training metrics in W\&B for deeper analysis:
    eval-protocol create rft \
      --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
      --wandb-project my-rft-experiments \
      --wandb-entity my-org
    ```

Set `WANDB_API_KEY` in your environment first.
  </Accordion>

  <Accordion title="Custom checkpoint frequency">
Save intermediate checkpoints during training:

```bash
    firectl rftj create \
      --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
      --checkpoint-frequency 500  # Save every 500 steps
      ...
    ```

Available in `firectl` only.
  </Accordion>

  <Accordion title="Custom timeout">
For evaluators that need more time:

```bash
    firectl rftj create \
      --rollout-timeout 300  # 5 minutes per rollout
      ...
    ```

Default is 60 seconds. Increase for complex evaluations.
  </Accordion>
</AccordionGroup>

For other tuning parameters — rollout concurrency, chunk size, loss method, and more — see [Parameter Tuning](/fine-tuning/parameter-tuning).

## Examples

**Fast experimentation** (small model, 1 epoch):

```bash
eval-protocol create rft \
  --base-model accounts/fireworks/models/qwen3-0p6b \
  --output-model quick-test

High-quality training (more rollouts, higher temperature):

eval-protocol create rft \
  --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
  --output-model high-quality-model \
  --n 8 \
  --temperature 1.0

Remote environment (for multi-turn agents):

eval-protocol create rft \
  --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
  --remote-server-url https://your-agent.example.com \
  --output-model remote-agent

Multiple epochs with custom learning rate:

eval-protocol create rft \
  --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
  --epochs 3 \
  --learning-rate 5e-5 \
  --output-model multi-epoch-model

Using firectl CLI (Alternative)#

For users already familiar with Fireworks firectl, you can create RFT jobs directly:

firectl rftj create \
  --base-model accounts/fireworks/models/llama-v3p1-8b-instruct \
  --dataset accounts/your-account/datasets/my-dataset \
  --evaluator accounts/your-account/evaluators/my-evaluator \
  --output-model my-finetuned-model

Differences from eval-protocol:

  • Requires fully qualified resource names (accounts/…)
  • Must manually upload evaluators and datasets first
  • More verbose but offers finer control
  • Same underlying API as eval-protocol

See firectl documentation for all options.

Next steps#

Review requirements, validation, and common errors

Track job progress, inspect rollouts, and debug issues

Learn how to adjust parameters for better results

Link last verified June 7, 2026. View original ↗
Source: Fireworks AI Docs
Link last verified: 2026-06-07