Training Overview

yes

Editorial Notes

This overview frames Fireworks’ three fine-tuning paths — the autonomous Agent, semi-managed Managed Fine-Tuning, and the custom Training API — so it matters as the decision page before you commit compute. The key heuristic it offers is to reach for supervised fine-tuning when you have more than about a thousand quality labeled examples, and to switch to reinforcement fine-tuning for smaller datasets or reasoning-heavy tasks where ground-truth labels do not exist. A common mistake is defaulting to SFT on too little data. This is the Fireworks counterpart to Together AI’s fine-tuning flow; read the quickstart first if you are new to the platform.


Original Documentation

Documentation Index#

Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.

Fireworks helps you fine-tune models to improve quality and performance for your product use cases, without the burden of building & maintaining your own training infrastructure.

Coming from OpenAI? Fireworks uses the same OpenAI-compatible chat completion format for training data — the same messages array with role, content, tool_calls, and weight fields. You can use your existing SFT datasets with no conversion required. See our OpenAI compatibility guide for more details.

Three ways to fine-tune#

Fireworks offers three approaches to fine-tuning, from fully autonomous to fully custom. Pick the one that fits how much control you want:

Describe what you want in plain English. Agent picks the base model, prepares the data, sweeps hyperparameters, evaluates, trains, and deploys. You approve a single plan and cost up front.

Best for the fastest path from dataset to deployed fine-tuned model — from the Fireworks dashboard or from inside Claude Code, Cursor, Codex, Aider, or Goose.

Give Fireworks your data and configuration. The platform handles scheduling, training, checkpointing, and model output. No custom code required.

Best for teams that want managed SFT, DPO, or RFT with LoRA or full-parameter tuning.

Write custom Python training loops. You control the loss function, optimizer step, checkpointing, and weight sync. Fireworks handles the distributed GPU infrastructure.

Best for research teams needing custom loops, custom rollout orchestration, or inference-in-the-loop evaluation.

Fireworks AgentManaged Fine-TuningTraining API
InterfaceNatural language (dashboard chat, firectl session, or via coding agent)UI, firectl, REST APIPython script
Who picks the modelAgent recommendsYouYou
Who tunes hyperparametersAgent runs a sweepYou set themYou set them
Cost approvalBuilt-in gate before any spendNone — you submit jobs directlyNone
Tuning methodFull-parameter or LoRAFull-parameter or LoRAFull-parameter or LoRA
Custom loss / training loopNot supportedNot supportedSupported
Inference-in-the-loop evalNot supportedNot supportedSupported (hotload)
Best forGetting a working fine-tuned model fast, without ML expertiseProduction fine-tuning with standard methodsResearch, custom RL, hybrid losses

When to use SFT vs. RFT#

In supervised fine-tuning, you provide a dataset with labeled examples of “good” outputs. In reinforcement fine-tuning, you provide a grader function that can be used to score the model’s outputs. The model is iteratively trained to produce outputs that maximize this score.

Supervised fine-tuning (SFT) works well for many common scenarios, especially when:

  • You have a sizable dataset (~1000+ examples) with high-quality, ground-truth labels.
  • The dataset covers most possible input scenarios.
  • Tasks are relatively straightforward, such as:
    • Classification
    • Content extraction

However, SFT may struggle in situations where:

  • Your dataset is small.
  • You lack ground-truth outputs (a.k.a. “golden generations”).
  • The task requires multi-step reasoning.

Here is a simple decision tree:

flowchart TD
        B{"Do you have labeled ground truth data?"}
        B --"Yes"--> C{"How much?"}
        C --"more than 1000 examples"--> D["SFT"]
        C --"100-1000 examples"-->F{"Does reasoning help?"}
        C --"~100s examples"--> E["RFT"]
        F --"No"-->D
        F -- "Yes" -->E
        B --"No"--> G{"Is this a verifiable task (see below)?"}
        G -- "Yes" -->E
        G -- "No"-->H["RLHF / LLM as judge"]

Verifiable refers to whether it is relatively easy to make a judgement on the quality of the model generation.

When to use the Training API instead#

Move from managed fine-tuning to the Training API when you need:

  • Custom training logic — hybrid objectives, custom reward shaping, or a non-standard algorithm beyond managed settings
  • Inference-in-the-loop evaluation — hotload checkpoints onto a serving deployment and sample mid-training
  • Per-step control — custom gradient accumulation, dynamic learning rate schedules, or algorithm research

Detailed capability comparison#

CapabilityManaged RFTTraining API
Launch trainingCLI or UIPython script
Loss functionsgrpo, dapo, gspo-token (built-in)Any custom loss via forward_backward_custom
Tuning modesFull-parameter or LoRAFull-parameter or LoRA
Context lengthFull context length supported by the selected training shapeFull context length supported by the selected training shape
Training loopFully managedYou write the loop
Per-step diagnosticsDashboard (reward, loss, rollouts)Full Python access to all metrics
Zero-variance filteringAutomaticYou implement
Checkpoint managementAutomaticYou control via save_weights_for_sampler_ext

Migrating from managed flow to Training API#

If you’ve been using managed RFT and want more control — custom loss functions, richer diagnostics, or algorithm experimentation — the Training API lets you implement your own training loop while keeping the same GPU infrastructure. Managed jobs and cookbook recipes now use the same core tuning capabilities, including LoRA or full-parameter tuning and the full context length supported by the selected training shape.

MoE models and Routing Replay#

For Mixture-of-Experts (MoE) models like Kimi K2 (384 experts), training stability benefits from Routing Replay — caching the expert routing assignments from the reference policy’s forward pass and replaying them during the training forward pass. This ensures that the same experts process the same tokens in both the reference and policy models, reducing gradient noise from routing changes.

Routing Replay is available in the Training API via the loss_fn_inputs mechanism — you can pass routing matrices from the reference forward pass into the training datum. Use the Training API when you need to inspect or customize those forward-pass inputs directly.

Link last verified June 7, 2026. View original ↗
Source: Fireworks AI Docs
Link last verified: 2026-06-07