RL Rollouts with Your Own Trainer ↗
noOriginal Documentation
Documentation Index#
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.
Integrate an external RL trainer with Fireworks inference: hot-load new checkpoints from your bucket and run rollouts via the OpenAI-compatible API.
Early Access Feature. External-bucket hot-load for RL rollouts is a
private preview. Contact Fireworks to enable
this path on your account before you use S3, MINIO, NEBIUS, or similar
non-FW_HOSTED storage.
Using a code agent? Follow sections in order: Prerequisites
→ Quickstart checklist → Hot-load API.
Required env: FIREWORKS_API_KEY. After your first full snapshot is serving,
read Incremental snapshots before
production training loops. For swap behavior and reset_prompt_cache, see
Ledger & debugging.
This guide is for teams that already run their own RL trainer (PyTorch FSDP, Megatron, a custom Ray cluster, etc.) and want Fireworks for large-scale inference during rollouts.
Is this the right guide?#
| Path | You own | Fireworks owns |
|---|---|---|
| This guide (BYOT rollouts) | Trainer, rewards, environment, checkpoint upload cadence | Hot-load deployment, distributed weight swap, inference, KV cache across rollouts |
| Training API | Training logic (recipes or SDK) | GPUs, trainer lifecycle, often FW_HOSTED bucket |
| Managed RFT | Dataset and evaluator | End-to-end hosted RL |
Why BYOT rollout inference?
- Disaggregated: Your trainer and rollout cluster can run in different regions or clouds; deployments can span multiple regions to pool capacity.
- Full-parameter scale: Full (non-LoRA) tuning for large models supported on Fireworks inference shapes.
- Fast checkpoint transfer: Lossless compressed incremental snapshots (
arc_v2, typically 20×+ compression) over standard object storage—no special RDMA networking between trainer and inference. - Async / off-policy friendly: Background download during rollouts; configurable swap semantics similar in spirit to PipelineRL—see checkpoint-swap behavior.
For Online RL (live user traffic as rollouts with rolling per-replica updates), the same hot-load infrastructure applies; contact Fireworks for production Online RL setup.
Placeholders#
Reuse these values in every command below:
| Placeholder | Example |
|---|---|
<account_id> | my-team |
<model_id> | qwen3-30b-a3b |
<deployment_id> | rl-rollout-prod |
<fireworks_api_key> | From API keys |
<your_bucket> / <your_upload_path> | Parent prefix configured on the deployment (no trailing slash) |
<checkpoint_id> | Snapshot directory name, e.g. version_001 (no slashes) |
Prerequisites#
Complete this checklist before creating a deployment:
- Fireworks account and API key — create a key and set
export FIREWORKS_API_KEY="<key>". - Account ID — In the dashboard, open your account settings or any resource URL; the account slug is the segment after
/accounts/(for exampleaccounts/<account_id>/...). - Feature enablement — Request external-bucket hot-load for RL rollouts on account
<account_id>, including your bucket provider (S3,GCS/gs://, orNEBIUS). - Object storage read access for Fireworks — Fireworks needs read-only access to the bucket prefix you will pass as
--hot-load-bucket-url. At enablement, Fireworks shares the IAM principal to grant access. Typical setup:- Amazon S3: Grant the Fireworks principal
s3:GetObject(ands3:ListBucketon the prefix) ons3://<your_bucket>/<your_upload_path>/*. - Google Cloud Storage: Grant
roles/storage.objectVieweron the bucket or prefix to the Fireworks service account provided at onboarding. - Nebius / MinIO: Equivalent read-only credentials or access key scoped to the upload prefix.
- Amazon S3: Grant the Fireworks principal
firectlinstalled — See firectl.- Base model and deployment shape — An RL-capable shape for your model (GPU count, precision). If you omit
--deployment-shape,firectlprompts you to pick one interactively.
Architecture#
flowchart LR
trainer["Your RL Trainer"] -->|"1. Upload checkpoint"| bucket[("External bucket")]
trainer -->|"2. Signal snapshot ready"| api["Fireworks Hot-Load API"]
api -->|"3. Load weights"| deployment["Inference Deployment"]
trainer -->|"4. Rollout via /v1/completions"| deployment
deployment -->|"Tokens + optional routing_matrix"| trainerYou own: trainer, reward shaping, checkpoint cadence, rollout orchestration.
Fireworks owns: hot-load logistics, distributed weight swap, inference serving, KV cache across rollouts.
End-to-end loop#
- Create a hot-load deployment.
- Upload and hot-load an initial full snapshot.
- Run rollouts against that snapshot.
- For each training step: upload and hot-load the next incremental snapshot (see Incremental snapshots).
- Run rollouts again.
- Every 20th or 30th step, publish a full snapshot instead of an incremental one. If the incremental chain fails, fall back to a full snapshot.
Quickstart checklist#
Use this table for your first rollout end-to-end:
| Step | Action | Done when |
|---|---|---|
| 1 | Create hot-load deployment | firectl deployment get <deployment_id> shows a healthy deployment |
| 2 | Upload full HF snapshot | All files exist under .../<checkpoint_id>/ in object storage |
| 3 | POST signal snapshot | HTTP 200 |
| 4 | GET poll status | Every replica has readiness: true and current_snapshot_identity matches your identity |
| 5 | Run rollouts | Chat/completions returns tokens |
1. Create a hot-load deployment#
Create the deployment that will serve rollouts. During preview, --enable-hot-load flags may be hidden from CLI help but can still be passed explicitly.
firectl create deployment accounts/<account_id>/models/<model_id> \
--deployment-shape <shape_name> \
--deployment-id <deployment_id> \
--enable-hot-load \
--hot-load-bucket-type S3 \
--hot-load-bucket-url s3://<your_bucket>/<your_upload_path> \
--hot-load-transition-type ASYNC \
--region US_OHIO_1Flags
--deployment-shape— Optional. If omitted,firectlprompts you to pick one.--hot-load-bucket-type—MINIO,S3,NEBIUS, orFW_HOSTED. This guide focuses on external buckets (S3,gs://, etc.).FW_HOSTEDis for Fireworks-managed trainers.--hot-load-bucket-url— Required when--enable-hot-loadis set. Examples:s3://mybucket/path,gs://mybucket/path. No trailing slash. This is the parent prefix; each snapshot is a subdirectory named byidentity(see snapshot layout).--hot-load-transition-type—ASYNC(recommended for RL) orSYNC. Defaults toASYNCwhen hot load is enabled. See checkpoint-swap behavior.--region— Where the deployment runs (for exampleUS_OHIO_1,US_VIRGINIA_1). Keep the trainer upload path geographically close to the bucket and deployment.
Save the account ID, deployment ID, and model ID from the output for hot-load and rollout calls.
If you do not set a shape, the CLI shows a shape picker:

2. Upload and hot-load an initial full snapshot#
Upload a full HuggingFace-format checkpoint, then signal Fireworks to load it.
Snapshot layout#
Place each snapshot under its own subdirectory. The identity you signal in the API must match the directory name (a single path segment—no slashes):
s3://<your_bucket>/<your_upload_path>/<checkpoint_id>/
├── config.json
├── tokenizer.json
├── tokenizer_config.json
├── model-00000.safetensors
├── model-00001.safetensors
└── ...Example with the recommended path pattern:
s3://<your_bucket>/<account_id>/<account_id>-<deployment_id>/version_001/identity/<checkpoint_id>— Any opaque string (for exampleversion_001orstep_00100).- Format — Same layout as the base model on HuggingFace:
config.json, tokenizer files, and safetensors weights. No tensor-parallel sharding in uploaded files. - File size — Split weights into multiple
.safetensorsfiles, each under about 5 GB. Group weights by layer when possible; putting one layer per file minimizes load time.
Optional: call the per-file hint API as each file lands to speed up loading on large models.
Signal and poll#
Use the Hot-load API below with { "identity": "<checkpoint_id>" } and poll until all replicas are ready.
Hot-load API#
All hot-load requests use these headers:
| Header | Value |
|---|---|
Authorization | Bearer <fireworks_api_key> |
fireworks-model | accounts/<account_id>/models/<model_id> |
fireworks-deployment | accounts/<account_id>/deployments/<deployment_id> |
Content-Type | application/json |
| Operation | Method | URL |
|---|---|---|
| Signal snapshot ready | POST | https://api.fireworks.ai/hot_load/v1/models/hot_load |
| Poll load status | GET | https://api.fireworks.ai/hot_load/v1/models/hot_load |
| Per-file hint (optional) | POST | https://api.fireworks.ai/hot_load/v1/models/hot_load/hint |
Signal snapshot ready#
Full snapshot body:
{ "identity": "version_001" }Incremental snapshot bodies, compression, hints, and checksum_format are documented in Incremental snapshots.
curl -X POST https://api.fireworks.ai/hot_load/v1/models/hot_load \
-H "Authorization: Bearer <fireworks_api_key>" \
-H "fireworks-model: accounts/<account_id>/models/<model_id>" \
-H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
-H "Content-Type: application/json" \
-d '{ "identity": "version_001" }'import os
import requests
API_KEY = os.environ["FIREWORKS_API_KEY"]
ACCOUNT = "<account_id>"
MODEL = f"accounts/{ACCOUNT}/models/<model_id>"
DEPLOYMENT = f"accounts/{ACCOUNT}/deployments/<deployment_id>"
HEADERS = {
"Authorization": f"Bearer {API_KEY}",
"fireworks-model": MODEL,
"fireworks-deployment": DEPLOYMENT,
"Content-Type": "application/json",
}
resp = requests.post(
"https://api.fireworks.ai/hot_load/v1/models/hot_load",
headers=HEADERS,
json={"identity": "version_001"},
timeout=60,
)
resp.raise_for_status()Poll load status#
Poll until every replica has readiness: true and current_snapshot_identity equals the identity you signaled.
curl https://api.fireworks.ai/hot_load/v1/models/hot_load \
-H "Authorization: Bearer <fireworks_api_key>" \
-H "fireworks-model: accounts/<account_id>/models/<model_id>" \
-H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>"status = requests.get(
"https://api.fireworks.ai/hot_load/v1/models/hot_load",
headers=HEADERS,
timeout=30,
).json()
replicas = status.get("replicas", [])
ready = (
replicas
and all(r.get("readiness") for r in replicas)
and all(r.get("current_snapshot_identity") == "version_001" for r in replicas)
)When to start rollouts#
- Default (on-policy): Wait until all replicas report readiness on the new
identity. - Off-policy / higher utilization: You may start sending rollouts when a subset of replicas is ready—inspect each entry in
replicasin theGETresponse. Stale-policy rollouts are expected; use async transition mode and monitor policy version in streaming responses (see Policy version in responses).
Per-file hints are optional but recommended for large checkpoints—see Incremental snapshots.
3. Run rollouts#
Call the OpenAI-compatible inference API. For multi-turn RL, set session headers so KV cache stays on one replica:
curl https://api.fireworks.ai/inference/v1/chat/completions \
-H "Authorization: Bearer <fireworks_api_key>" \
-H "fireworks-model: accounts/<account_id>/models/<model_id>" \
-H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
-H "x-multi-turn-session-id: <trajectory_id>" \
-H "x-session-affinity: <trajectory_id>" \
-H "Content-Type: application/json" \
-d '{
"model": "accounts/<account_id>/models/<model_id>",
"messages": [{"role": "user", "content": "..."}]
}'See Inference for RL rollouts for session affinity, weight-swap behavior, MoE Router Replay (R3), and policy-version fields.
Steady-state training loop#
After the first full snapshot:
- Intermediate steps — Build and upload an incremental snapshot (
arc_v2), signal withincremental_snapshot_metadata, poll until ready, then run rollouts. - Every 20th or 30th step — Publish a new full snapshot for faster recovery and chain reset.
- On failure — Fall back to a full snapshot; see Ledger & debugging.
Brief incremental signal example (full details on the incremental page):
curl -X POST https://api.fireworks.ai/hot_load/v1/models/hot_load \
-H "Authorization: Bearer <fireworks_api_key>" \
-H "fireworks-model: accounts/<account_id>/models/<model_id>" \
-H "fireworks-deployment: accounts/<account_id>/deployments/<deployment_id>" \
-H "Content-Type: application/json" \
-d '{
"identity": "version_002",
"incremental_snapshot_metadata": {
"previous_snapshot_identity": "version_001",
"compression_format": "arc_v2",
"checksum_format": "alder32"
}
}'Numerics alignment#
For best training–inference alignment:
- Match quantization / precision between trainer checkpoints and the deployment shape (work with Fireworks if you need a custom shape).
- Measure logprob divergence between trainer forward passes and rollout inference on the same tokens.
- For MoE models, use Router Replay (R3) during rollouts—see MoE Router Replay.
Next steps#
Build ARC2 deltas, per-file hints, and incremental signal bodies.
Inspect snapshot history, reset the ledger, and reason about request behavior during weight swaps.
Session affinity headers, policy version in streams, weight-swap behavior, and MoE Router Replay (R3).
The alternative path where Fireworks runs the trainer through the Training API.