Overview ↗
noOriginal Documentation
Documentation Index#
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.
Train models using reinforcement learning in minutes
Reinforcement Fine-Tuning (RFT) is free for models under 16B parameters. When creating an RFT job in the UI, filter for free tuning models in the model selection area on the fine-tuning creation page. If kicking off jobs from the terminal, you can find the model ID from the Model Library. Note: SFT and DPO jobs are billed per training token for all model sizes—see the pricing page for details.
Fireworks RFT helps you train frontier models like DeepSeek V3 and Kimi K2 to outperform closed models for your product use case, using reinforcement learning. Fireworks RFT is powerful and easy to use for developers and enterprises:
- No infrastructure: Train frontier models without managing GPUs or RL infra
- Production-ready: Built-in tracing, monitoring, security & one-click deploy
- Fast iteration: From evaluator setup to deployed model in hours, not weeks
See how Genspark and Vercel used Fireworks RFT to train open models for agentic use cases, outperforming leading closed models.
Quickstart: Pick Your Training Approach#
⏱️ 15 minutes
Best for: Testing locally, simple task training
How it works: Iterate on your evaluator and use it to train a small model on Fireworks.
⏱️ 1-2 hours
Best for: Agents, multi-turn workflows, existing services
How it works: Rollouts happen in your environment. Connect via HTTP with tracing.
⏱️ 2-4 hours
Best for: Sensitive data, compliance, enterprise
How it works: Training data never leaves your GCS/S3 bucket. Full data isolation.
Launch Training#
Requirements, validation checks, and common errors before launching
Fast, scriptable, reproducible. Perfect for automation and iteration
Visual, guided, beginner-friendly. Great for exploring options
Already familiar with firectl? You can create RFT jobs directly.
RFT Concepts#
The RL training loop explained
How reward functions guide training
Local vs remote evaluation environments
Optimize your training configuration
Estimate and optimize your training costs