RL Rollouts with Your Own Trainer

no
Summary: Integrate an external RL trainer with Fireworks inference: hot-load new checkpoints from your bucket and run rollouts via the OpenAI-compatible API.

Original Documentation

Documentation Index#

Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.

Integrate an external RL trainer with Fireworks inference: hot-load new checkpoints from your bucket and run rollouts via the OpenAI-compatible API.

Early Access Feature. External-bucket hot-load for RL rollouts is a private preview. Contact Fireworks to enable this path on your account before you use S3, MINIO, NEBIUS, or similar non-FW_HOSTED storage.

Using a code agent? Follow sections in order: PrerequisitesQuickstart checklistHot-load API. Required env: FIREWORKS_API_KEY. After your first full snapshot is serving, read Incremental snapshots before production training loops. For swap behavior and reset_prompt_cache, see Ledger & debugging.

This guide is for teams that already run their own RL trainer (PyTorch FSDP, Megatron, a custom Ray cluster, etc.) and want Fireworks for large-scale inference during rollouts.

Is this the right guide?#

PathYou ownFireworks owns
This guide (BYOT rollouts)Trainer, rewards, environment, checkpoint upload cadenceHot-load deployment, distributed weight swap, inference, KV cache across rollouts
Training APITraining logic (recipes or SDK)GPUs, trainer lifecycle, often FW_HOSTED bucket
Managed RFTDataset and evaluatorEnd-to-end hosted RL

Why BYOT rollout inference?

  • Disaggregated: Your trainer and rollout cluster can run in different regions or clouds; deployments can span multiple regions to pool capacity.
  • Full-parameter scale: Full (non-LoRA) tuning for large models supported on Fireworks inference shapes.
  • Fast checkpoint transfer: Lossless compressed incremental snapshots (arc_v2, typically 20×+ compression) over standard object storage—no special RDMA networking between trainer and inference.
  • Async / off-policy friendly: Background download during rollouts; configurable swap semantics similar in spirit to PipelineRL—see checkpoint-swap behavior.

For Online RL (live user traffic as rollouts with rolling per-replica updates), the same hot-load infrastructure applies; contact Fireworks for production Online RL setup.

Placeholders#

Reuse these values in every command below:

PlaceholderExample
<account_id>my-team
<model_id>qwen3-30b-a3b
<deployment_id>rl-rollout-prod
<fireworks_api_key>From API keys
<your_bucket> / <your_upload_path>Parent prefix configured on the deployment (no trailing slash)
<checkpoint_id>Snapshot directory name, e.g. version_001 (no slashes)

Prerequisites#

Complete this checklist before creating a deployment:

  1. Fireworks account and API keycreate a key and set export FIREWORKS_API_KEY="<key>".
  2. Account ID — In the dashboard, open your account settings or any resource URL; the account slug is the segment after /accounts/ (for example accounts/<account_id>/...).
  3. Feature enablement — Request external-bucket hot-load for RL rollouts on account <account_id>, including your bucket provider (S3, GCS/gs://, or NEBIUS).
  4. Object storage read access for Fireworks — Fireworks needs read-only access to the bucket prefix you will pass as --hot-load-bucket-url. At enablement, Fireworks shares the IAM principal to grant access. Typical setup:
    • Amazon S3: Grant the Fireworks principal s3:GetObject (and s3:ListBucket on the prefix) on s3://<your_bucket>/<your_upload_path>/*.
    • Google Cloud Storage: Grant roles/storage.objectViewer on the bucket or prefix to the Fireworks service account provided at onboarding.
    • Nebius / MinIO: Equivalent read-only credentials or access key scoped to the upload prefix.
  5. firectl installed — See firectl.
  6. Base model and deployment shape — An RL-capable shape for your model (GPU count, precision). If you omit --deployment-shape, firectl prompts you to pick one interactively.

Architecture#

flowchart LR
  trainer["Your RL Trainer"] -->|"1. Upload checkpoint"| bucket[("External bucket")]
  trainer -->|"2. Signal snapshot ready"| api["Fireworks Hot-Load API"]
  api -->|"3. Load weights"| deployment["Inference Deployment"]
  trainer -->|"4. Rollout via /v1/completions"| deployment
  deployment -->|"Tokens + optional routing_matrix"| trainer

You own: trainer, reward shaping, checkpoint cadence, rollout orchestration.

Fireworks owns: hot-load logistics, distributed weight swap, inference serving, KV cache across rollouts.

End-to-end loop#

  1. Create a hot-load deployment.
  2. Upload and hot-load an initial full snapshot.
  3. Run rollouts against that snapshot.
  4. For each training step: upload and hot-load the next incremental snapshot (see Incremental snapshots).
  5. Run rollouts again.
  6. Every 20th or 30th step, publish a full snapshot instead of an incremental one. If the incremental chain fails, fall back to a full snapshot.

Quickstart checklist#

Use this table for your first rollout end-to-end:

StepActionDone when
1Create hot-load deploymentfirectl deployment get <deployment_id> shows a healthy deployment
2Upload full HF snapshotAll files exist under .../<checkpoint_id>/ in object storage
3POST signal snapshotHTTP 200
4GET poll statusEvery replica has readiness: true and current_snapshot_identity matches your identity
5Run rolloutsChat/completions returns tokens

1. Create a hot-load deployment#

Create the deployment that will serve rollouts. During preview, --enable-hot-load flags may be hidden from CLI help but can still be passed explicitly.

firectl create deployment accounts/<account_id>/models/<model_id> \
  --deployment-shape <shape_name> \
  --deployment-id <deployment_id> \
  --enable-hot-load \
  --hot-load-bucket-type S3 \
  --hot-load-bucket-url s3://<your_bucket>/<your_upload_path> \
  --hot-load-transition-type ASYNC \
  --region US_OHIO_1

Flags

  • --deployment-shape — Optional. If omitted, firectl prompts you to pick one.
  • --hot-load-bucket-typeMINIO, S3, NEBIUS, or FW_HOSTED. This guide focuses on external buckets (S3, gs://, etc.). FW_HOSTED is for Fireworks-managed trainers.
  • --hot-load-bucket-url — Required when --enable-hot-load is set. Examples: s3://mybucket/path, gs://mybucket/path. No trailing slash. This is the parent prefix; each snapshot is a subdirectory named by identity (see snapshot layout).
  • --hot-load-transition-typeASYNC (recommended for RL) or SYNC. Defaults to ASYNC when hot load is enabled. See checkpoint-swap behavior.
  • --region — Where the deployment runs (for example US_OHIO_1, US_VIRGINIA_1). Keep the trainer upload path geographically close to the bucket and deployment.

Save the account ID, deployment ID, and model ID from the output for hot-load and rollout calls.

If you do not set a shape, the CLI shows a shape picker:

firectl deployment shape picker

2. Upload and hot-load an initial full snapshot#

Upload a full HuggingFace-format checkpoint, then signal Fireworks to load it.

Snapshot layout#

Place each snapshot under its own subdirectory. The identity you signal in the API must match the directory name (a single path segment—no slashes):

s3://<your_bucket>/<your_upload_path>/<checkpoint_id>/
├── config.json
├── tokenizer.json
├── tokenizer_config.json
├── model-00000.safetensors
├── model-00001.safetensors
└── ...

Example with the recommended path pattern:

s3://<your_bucket>/<account_id>/<account_id>-<deployment_id>/version_001/
  • identity / <checkpoint_id> — Any opaque string (for example version_001 or step_00100).
  • Format — Same layout as the base model on HuggingFace: config.json, tokenizer files, and safetensors weights. No tensor-parallel sharding in uploaded files.
  • File size — Split weights into multiple .safetensors files, each under about 5 GB. Group weights by layer when possible; putting one layer per file minimizes load time.

Optional: call the per-file hint API as each file lands to speed up loading on large models.

Signal and poll#

Use the Hot-load API below with { "identity": "<checkpoint_id>" } and poll until all replicas are ready.

Hot-load API#

All hot-load requests use these headers:

HeaderValue
AuthorizationBearer <fireworks_api_key>
fireworks-modelaccounts/<account_id>/models/<model_id>
fireworks-deploymentaccounts/<account_id>/deployments/<deployment_id>
Content-Typeapplication/json
OperationMethodURL
Signal snapshot readyPOSThttps://api.fireworks.ai/hot_load/v1/models/hot_load
Poll load statusGEThttps://api.fireworks.ai/hot_load/v1/models/hot_load
Per-file hint (optional)POSThttps://api.fireworks.ai/hot_load/v1/models/hot_load/hint

Signal snapshot ready#

Full snapshot body:

{ "identity": "version_001" }

Incremental snapshot bodies, compression, hints, and checksum_format are documented in Incremental snapshots.

Snapshot directory name under the configured bucket prefix. Must not contain `/`. Required for incremental snapshots. Includes `previous_snapshot_identity`, `compression_format` (`arc_v2`), and `checksum_format` (`alder32`). See the incremental snapshots guide. Prompt-cache policy after the swap: `all` (default), `none`, or `new_session`. See [prompt cache reset behavior](/fine-tuning/rl-rollout-debugging#prompt-cache-reset-behavior). Top-level `config.json` fields to ignore during snapshot validation. Only use for known-safe metadata fields.
curl -X POST https://api.fireworks.ai/hot_load/v1/models/hot_load \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
  -H "Content-Type: application/json" \
  -d '{ "identity": "version_001" }'
import os
import requests

API_KEY = os.environ["FIREWORKS_API_KEY"]
ACCOUNT = "<account_id>"
MODEL = f"accounts/{ACCOUNT}/models/<model_id>"
DEPLOYMENT = f"accounts/{ACCOUNT}/deployments/<deployment_id>"
HEADERS = {
    "Authorization": f"Bearer {API_KEY}",
    "fireworks-model": MODEL,
    "fireworks-deployment": DEPLOYMENT,
    "Content-Type": "application/json",
}

resp = requests.post(
    "https://api.fireworks.ai/hot_load/v1/models/hot_load",
    headers=HEADERS,
    json={"identity": "version_001"},
    timeout=60,
)
resp.raise_for_status()

Poll load status#

Poll until every replica has readiness: true and current_snapshot_identity equals the identity you signaled.

curl https://api.fireworks.ai/hot_load/v1/models/hot_load \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>"
status = requests.get(
    "https://api.fireworks.ai/hot_load/v1/models/hot_load",
    headers=HEADERS,
    timeout=30,
).json()

replicas = status.get("replicas", [])
ready = (
    replicas
    and all(r.get("readiness") for r in replicas)
    and all(r.get("current_snapshot_identity") == "version_001" for r in replicas)
)

When to start rollouts#

  • Default (on-policy): Wait until all replicas report readiness on the new identity.
  • Off-policy / higher utilization: You may start sending rollouts when a subset of replicas is ready—inspect each entry in replicas in the GET response. Stale-policy rollouts are expected; use async transition mode and monitor policy version in streaming responses (see Policy version in responses).

Per-file hints are optional but recommended for large checkpoints—see Incremental snapshots.

3. Run rollouts#

Call the OpenAI-compatible inference API. For multi-turn RL, set session headers so KV cache stays on one replica:

curl https://api.fireworks.ai/inference/v1/chat/completions \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
  -H "x-multi-turn-session-id: <trajectory_id>" \
  -H "x-session-affinity: <trajectory_id>" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "accounts/<account_id>/models/<model_id>",
    "messages": [{"role": "user", "content": "..."}]
  }'

See Inference for RL rollouts for session affinity, weight-swap behavior, MoE Router Replay (R3), and policy-version fields.

Steady-state training loop#

After the first full snapshot:

  1. Intermediate steps — Build and upload an incremental snapshot (arc_v2), signal with incremental_snapshot_metadata, poll until ready, then run rollouts.
  2. Every 20th or 30th step — Publish a new full snapshot for faster recovery and chain reset.
  3. On failure — Fall back to a full snapshot; see Ledger & debugging.

Brief incremental signal example (full details on the incremental page):

curl -X POST https://api.fireworks.ai/hot_load/v1/models/hot_load \
  -H "Authorization: Bearer <fireworks_api_key>" \
  -H "fireworks-model: accounts/<account_id>/models/<model_id>" \
  -H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
  -H "Content-Type: application/json" \
  -d '{
    "identity": "version_002",
    "incremental_snapshot_metadata": {
      "previous_snapshot_identity": "version_001",
      "compression_format": "arc_v2",
      "checksum_format": "alder32"
    }
  }'

Numerics alignment#

For best training–inference alignment:

  • Match quantization / precision between trainer checkpoints and the deployment shape (work with Fireworks if you need a custom shape).
  • Measure logprob divergence between trainer forward passes and rollout inference on the same tokens.
  • For MoE models, use Router Replay (R3) during rollouts—see MoE Router Replay.

Next steps#

Build ARC2 deltas, per-file hints, and incremental signal bodies.

Inspect snapshot history, reset the ledger, and reason about request behavior during weight swaps.

Session affinity headers, policy version in streams, weight-swap behavior, and MoE Router Replay (R3).

The alternative path where Fireworks runs the trainer through the Training API.

Link last verified June 7, 2026. View original ↗
Source: Fireworks AI Docs
Link last verified: 2026-06-07