The Cookbook ↗

Summary: Ready-to-run training recipes for GRPO, DPO, SFT, and distillation built on top of the Training API.

Original Documentation

Documentation Index#
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.

Ready-to-run training recipes for GRPO, DPO, SFT, and distillation built on top of the Training API.

What is the Cookbook?#

The Fireworks Cookbook is a collection of training recipes and utilities built on top of the Training API. It provides config-driven training loops that handle trainer provisioning, data loading, tokenization, gradient accumulation, checkpointing, and cleanup automatically.

The cookbook is optional — everything it does can be done with the API directly. Use the cookbook when you want a working training loop quickly; use the API when you need full control.

Installation#

git clone https://github.com/fw-ai/cookbook.git
cd cookbook/training && pip install -e .

Set your credentials:

export FIREWORKS_API_KEY="your-api-key"

Available recipes#

Recipe	Module	Use case
RL (primary, experimental)	`training.recipes.async_rl_loop`	Reinforcement learning — you write a rollout function, the recipe owns the loop. Async rollout/training overlap by default; fully synchronous on-policy with `synchronous_training=True`. GRPO, importance sampling, DAPO, DRO, GSPO, CISPO. See Cookbook RL. No backward-compatibility guarantee.
RL (simpler, synchronous)	`training.recipes.rl_loop`	Synchronous on-policy GRPO scaffold — reach for it when you want the server-side fast loss path or don’t need rollout/train overlap
IGPO	`training.recipes.igpo_loop`	Information Gain-based Policy Optimization — turn-level IG rewards for multi-turn agent trajectories (extends GRPO)
DPO	`training.recipes.dpo_loop`	Direct preference optimization from chosen/rejected pairs
SFT	`training.recipes.sft_loop`	Supervised fine-tuning with cross-entropy loss
Distillation	`training.recipes.distillation_loop`	On-policy sampled-token distillation with one teacher or routed multi-teacher MOPD
ORPO	`training.recipes.orpo_loop`	Odds ratio preference optimization

Each recipe follows the same pattern: import Config and main, set your config, and call main(cfg). Trainer and deployment provisioning is handled internally by the recipe — you describe what you want with TrainerConfig / DeployConfig, and the SDK attaches or creates the resources.

All launch examples below use trainer=TrainerConfig(training_shape_id=...) for explicit shape selection. Cookbook recipes can also auto-select validated shapes when training_shape_id is unset. The main run-level trainer knob you may set alongside a shape is replica_count for replicated HSDP launches; reference shapes can usually be left unset because the cookbook auto-selects or uses a shared-session reference when appropriate.

If you want field-level details about what a training shape controls and what stays configurable, see Training Shapes and the Cookbook Reference.

InfraConfig and the standalone setup_infra / ResourceCleanup helpers are deprecated and removed from the recipe surface. Recipes now take trainer=TrainerConfig(...) (and deployment=DeployConfig(...) for RL). See Migrating from the deprecated managed infra.

Quick example: SFT#

from training.recipes.sft_loop import Config, main
from training.utils import TrainerConfig

cfg = Config(
    log_path="./sft_quickstart",
    base_model="accounts/fireworks/models/qwen3-8b",
    dataset="/path/to/training_data.jsonl",
    tokenizer_model="Qwen/Qwen3-8B",
    max_seq_len=4096,
    epochs=1,
    batch_size=4,
    trainer=TrainerConfig(
        training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200",
    ),
)

main(cfg)

Quick example: GRPO#

from training.recipes.rl_loop import Config, main
from training.utils import DeployConfig, TrainerConfig

cfg = Config(
    log_path="./grpo_quickstart",
    base_model="accounts/fireworks/models/qwen3-8b",
    dataset="/path/to/prompts.jsonl",
    max_rows=100,
    trainer=TrainerConfig(
        training_shape_id="accounts/fireworks/trainingShapes/qwen3-8b-128k-h200",
    ),
    deployment=DeployConfig(
        deployment_id="grpo-serving",
        tokenizer_model="Qwen/Qwen3-8B",
    ),
    weight_sync_interval=1,
)

main(cfg)

W&B logging#

All cookbook recipes accept a WandBConfig to stream metrics to Weights & Biases:

from training.utils import WandBConfig

cfg = Config(
    # ... same config as above ...
    wandb=WandBConfig(
        entity="my-team",
        project="grpo-experiment",
        run_name="qwen3-8b-sft-v1",  # optional, auto-generated if omitted
    ),
)

main(cfg)

Vision-language model support#

All cookbook recipes support VLM fine-tuning. Use a VLM training shape and tokenizer, and provide multimodal datasets with image_url content. See Vision Inputs for dataset format and examples.

Next steps#

Cookbook SFT — supervised fine-tuning
Cookbook DPO — preference optimization with pairwise data
Cookbook RL (GRPO) — full GRPO walkthrough with reward functions
Cookbook Distillation — OPD and routed MOPD dataset format
Vision Inputs — fine-tune VLMs with image and text data
Cookbook Reference — all config classes and parameters

Link last verified June 7, 2026. View original ↗

Source: Fireworks AI Docs

Link last verified: 2026-06-07