RL Rollouts with Your Own Trainer ↗

fireworks guide intermediate fine-tuning

Summary: Integrate an external RL trainer with Fireworks inference: hot-load new checkpoints from your bucket and run rollouts via the OpenAI-compatible API.

Original Documentation

Documentation Index#
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.

Integrate an external RL trainer with Fireworks inference: hot-load new checkpoints from your bucket and run rollouts via the OpenAI-compatible API.

Early Access Feature. External-bucket hot-load for RL rollouts is a private preview. Contact Fireworks to enable this path on your account before you use S3, MINIO, NEBIUS, or similar non-FW_HOSTED storage.

Using a code agent? Follow sections in order: Prerequisites → Quickstart checklist → Hot-load API. Required env: FIREWORKS_API_KEY. After your first full snapshot is serving, read Incremental snapshots before production training loops. For swap behavior and reset_prompt_cache, see Ledger & debugging.

This guide is for teams that already run their own RL trainer (PyTorch FSDP, Megatron, a custom Ray cluster, etc.) and want Fireworks for large-scale inference during rollouts.

Is this the right guide?#

Path	You own	Fireworks owns
This guide (BYOT rollouts)	Trainer, rewards, environment, checkpoint upload cadence	Hot-load deployment, distributed weight swap, inference, KV cache across rollouts
Training API	Training logic (recipes or SDK)	GPUs, trainer lifecycle, often `FW_HOSTED` bucket
Managed RFT	Dataset and evaluator	End-to-end hosted RL

Why BYOT rollout inference?

Disaggregated: Your trainer and rollout cluster can run in different regions or clouds; deployments can span multiple regions to pool capacity.
Full-parameter scale: Full (non-LoRA) tuning for large models supported on Fireworks inference shapes.
Fast checkpoint transfer: Lossless compressed incremental snapshots (arc_v2, typically 20×+ compression) over standard object storage—no special RDMA networking between trainer and inference.
Async / off-policy friendly: Background download during rollouts; configurable swap semantics similar in spirit to PipelineRL—see checkpoint-swap behavior.

For Online RL (live user traffic as rollouts with rolling per-replica updates), the same hot-load infrastructure applies; contact Fireworks for production Online RL setup.

Placeholders#

Reuse these values in every command below:

Placeholder	Example
`<account_id>`	`my-team`
`<model_id>`	`qwen3-30b-a3b`
`<deployment_id>`	`rl-rollout-prod`
`<fireworks_api_key>`	From API keys
`<your_bucket>` / `<your_upload_path>`	Parent prefix configured on the deployment (no trailing slash)
`<checkpoint_id>`	Snapshot directory name, e.g. `version_001` (no slashes)

Prerequisites#

Complete this checklist before creating a deployment:

Fireworks account and API key — create a key and set export FIREWORKS_API_KEY="<key>".
Account ID — In the dashboard, open your account settings or any resource URL; the account slug is the segment after /accounts/ (for example accounts/<account_id>/...).
Feature enablement — Request external-bucket hot-load for RL rollouts on account <account_id>, including your bucket provider (S3, GCS/gs://, or NEBIUS).
Object storage read access for Fireworks — Fireworks needs read-only access to the bucket prefix you will pass as --hot-load-bucket-url. At enablement, Fireworks shares the IAM principal to grant access. Typical setup:
- Amazon S3: Grant the Fireworks principal s3:GetObject (and s3:ListBucket on the prefix) on s3://<your_bucket>/<your_upload_path>/*.
- Google Cloud Storage: Grant roles/storage.objectViewer on the bucket or prefix to the Fireworks service account provided at onboarding.
- Nebius / MinIO: Equivalent read-only credentials or access key scoped to the upload prefix.
firectl installed — See firectl.
Base model and deployment shape — An RL-capable shape for your model (GPU count, precision). If you omit --deployment-shape, firectl prompts you to pick one interactively.

Architecture#

flowchart LR
  trainer["Your RL Trainer"] -->|"1. Upload checkpoint"| bucket[("External bucket")]
  trainer -->|"2. Signal snapshot ready"| api["Fireworks Hot-Load API"]
  api -->|"3. Load weights"| deployment["Inference Deployment"]
  trainer -->|"4. Rollout via /v1/completions"| deployment
  deployment -->|"Tokens + optional routing_matrix"| trainer

You own: trainer, reward shaping, checkpoint cadence, rollout orchestration.

Fireworks owns: hot-load logistics, distributed weight swap, inference serving, KV cache across rollouts.

End-to-end loop#

Create a hot-load deployment.
Upload and hot-load an initial full snapshot.
Run rollouts against that snapshot.
For each training step: upload and hot-load the next incremental snapshot (see Incremental snapshots).
Run rollouts again.
Every 20th or 30th step, publish a full snapshot instead of an incremental one. If the incremental chain fails, fall back to a full snapshot.

Quickstart checklist#

Use this table for your first rollout end-to-end:

Step	Action	Done when
1	Create hot-load deployment	`firectl deployment get <deployment_id>` shows a healthy deployment
2	Upload full HF snapshot	All files exist under `.../<checkpoint_id>/` in object storage
3	`POST` signal snapshot	HTTP 200
4	`GET` poll status	Every replica has `readiness: true` and `current_snapshot_identity` matches your `identity`
5	Run rollouts	Chat/completions returns tokens

1. Create a hot-load deployment#

Create the deployment that will serve rollouts. During preview, --enable-hot-load flags may be hidden from CLI help but can still be passed explicitly.

firectl create deployment accounts/<account_id>/models/<model_id> \
  --deployment-shape <shape_name> \
  --deployment-id <deployment_id> \
  --enable-hot-load \
  --hot-load-bucket-type S3 \
  --hot-load-bucket-url s3://<your_bucket>/<your_upload_path> \
  --hot-load-transition-type ASYNC \
  --region US_OHIO_1

Flags

--deployment-shape — Optional. If omitted, firectl prompts you to pick one.
--hot-load-bucket-type — MINIO, S3, NEBIUS, or FW_HOSTED. This guide focuses on external buckets (S3, gs://, etc.). FW_HOSTED is for Fireworks-managed trainers.
--hot-load-bucket-url — Required when --enable-hot-load is set. Examples: s3://mybucket/path, gs://mybucket/path. No trailing slash. This is the parent prefix; each snapshot is a subdirectory named by identity (see snapshot layout).
--hot-load-transition-type — ASYNC (recommended for RL) or SYNC. Defaults to ASYNC when hot load is enabled. See checkpoint-swap behavior.
--region — Where the deployment runs (for example US_OHIO_1, US_VIRGINIA_1). Keep the trainer upload path geographically close to the bucket and deployment.

Save the account ID, deployment ID, and model ID from the output for hot-load and rollout calls.

If you do not set a shape, the CLI shows a shape picker:

2. Upload and hot-load an initial full snapshot#

Upload a full HuggingFace-format checkpoint, then signal Fireworks to load it.

Snapshot layout#

Place each snapshot under its own subdirectory. The identity you signal in the API must match the directory name (a single path segment—no slashes):

s3://<your_bucket>/<your_upload_path>/<checkpoint_id>/
├── config.json
├── tokenizer.json
├── tokenizer_config.json
├── model-00000.safetensors
├── model-00001.safetensors
└── ...

Example with the recommended path pattern:

s3://<your_bucket>/<account_id>/<account_id>-<deployment_id>/version_001/

identity / <checkpoint_id> — Any opaque string (for example version_001 or step_00100).
Format — Same layout as the base model on HuggingFace: config.json, tokenizer files, and safetensors weights. No tensor-parallel sharding in uploaded files.
File size — Split weights into multiple .safetensors files, each under about 5 GB. Group weights by layer when possible; putting one layer per file minimizes load time.

Optional: call the per-file hint API as each file lands to speed up loading on large models.

Signal and poll#

Use the Hot-load API below with { "identity": "<checkpoint_id>" } and poll until all replicas are ready.

Hot-load API#

All hot-load requests use these headers:

Header	Value
`Authorization`	`Bearer <fireworks_api_key>`
`fireworks-model`	`accounts/<account_id>/models/<model_id>`
`fireworks-deployment`	`accounts/<account_id>/deployments/<deployment_id>`
`Content-Type`	`application/json`

Operation	Method	URL
Signal snapshot ready	`POST`	`https://api.fireworks.ai/hot_load/v1/models/hot_load`
Poll load status	`GET`	`https://api.fireworks.ai/hot_load/v1/models/hot_load`
Per-file hint (optional)	`POST`	`https://api.fireworks.ai/hot_load/v1/models/hot_load/hint`

Signal snapshot ready#

Full snapshot body:

{ "identity": "version_001" }

Incremental snapshot bodies, compression, hints, and checksum_format are documented in Incremental snapshots.

Snapshot directory name under the configured bucket prefix. Must not contain `/`. Required for incremental snapshots. Includes `previous_snapshot_identity`, `compression_format` (`arc_v2`), and `checksum_format` (`alder32`). See the incremental snapshots guide. Prompt-cache policy after the swap: `all` (default), `none`, or `new_session`. See [prompt cache reset behavior](/fine-tuning/rl-rollout-debugging#prompt-cache-reset-behavior). Top-level `config.json` fields to ignore during snapshot validation. Only use for known-safe metadata fields.

curl -X POST https://api.fireworks.ai/hot_load/v1/models/hot_load \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
  -H "Content-Type: application/json" \
  -d '{ "identity": "version_001" }'

import os
import requests

API_KEY = os.environ["FIREWORKS_API_KEY"]
ACCOUNT = "<account_id>"
MODEL = f"accounts/{ACCOUNT}/models/<model_id>"
DEPLOYMENT = f"accounts/{ACCOUNT}/deployments/<deployment_id>"
HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "fireworks-model": MODEL,
    "fireworks-deployment": DEPLOYMENT,
    "Content-Type": "application/json",
}

resp = requests.post(
    "https://api.fireworks.ai/hot_load/v1/models/hot_load",
    headers=HEADERS,
    json={"identity": "version_001"},
    timeout=60,
)
resp.raise_for_status()

Poll load status#

Poll until every replica has readiness: true and current_snapshot_identity equals the identity you signaled.

curl https://api.fireworks.ai/hot_load/v1/models/hot_load \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>"

status = requests.get(
    "https://api.fireworks.ai/hot_load/v1/models/hot_load",
    headers=HEADERS,
    timeout=30,
).json()

replicas = status.get("replicas", [])
ready = (
    replicas
    and all(r.get("readiness") for r in replicas)
    and all(r.get("current_snapshot_identity") == "version_001" for r in replicas)
)

When to start rollouts#

Default (on-policy): Wait until all replicas report readiness on the new identity.
Off-policy / higher utilization: You may start sending rollouts when a subset of replicas is ready—inspect each entry in replicas in the GET response. Stale-policy rollouts are expected; use async transition mode and monitor policy version in streaming responses (see Policy version in responses).

Per-file hints are optional but recommended for large checkpoints—see Incremental snapshots.

3. Run rollouts#

Call the OpenAI-compatible inference API. For multi-turn RL, set session headers so KV cache stays on one replica:

curl https://api.fireworks.ai/inference/v1/chat/completions \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
  -H "x-multi-turn-session-id: <trajectory_id>" \
  -H "x-session-affinity: <trajectory_id>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "accounts/<account_id>/models/<model_id>",
    "messages": [{"role": "user", "content": "..."}]
  }'

See Inference for RL rollouts for session affinity, weight-swap behavior, MoE Router Replay (R3), and policy-version fields.

Steady-state training loop#

After the first full snapshot:

Intermediate steps — Build and upload an incremental snapshot (arc_v2), signal with incremental_snapshot_metadata, poll until ready, then run rollouts.
Every 20th or 30th step — Publish a new full snapshot for faster recovery and chain reset.
On failure — Fall back to a full snapshot; see Ledger & debugging.

Brief incremental signal example (full details on the incremental page):

curl -X POST https://api.fireworks.ai/hot_load/v1/models/hot_load \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
  -H "Content-Type: application/json" \
  -d '{
    "identity": "version_002",
    "incremental_snapshot_metadata": {
      "previous_snapshot_identity": "version_001",
      "compression_format": "arc_v2",
      "checksum_format": "alder32"
    }
  }'

Numerics alignment#

For best training–inference alignment:

Match quantization / precision between trainer checkpoints and the deployment shape (work with Fireworks if you need a custom shape).
Measure logprob divergence between trainer forward passes and rollout inference on the same tokens.
For MoE models, use Router Replay (R3) during rollouts—see MoE Router Replay.

Next steps#

Build ARC2 deltas, per-file hints, and incremental signal bodies.

Inspect snapshot history, reset the ledger, and reason about request behavior during weight swaps.

Session affinity headers, policy version in streams, weight-swap behavior, and MoE Router Replay (R3).

The alternative path where Fireworks runs the trainer through the Training API.

Link last verified June 7, 2026. View original ↗

Source: Fireworks AI Docs

Link last verified: 2026-06-07