W&B Training ↗

Original Documentation

Documentation Index#
Fetch the complete documentation index at: https://docs.wandb.ai/llms.txt Use this file to discover all available pages before exploring further.

Post-train your models using reinforcement learning and supervised fine-tuning

Now in public preview, W&B Training offers serverless post-training for large language models (LLMs), including both reinforcement learning (RL) and supervised fine-tuning (SFT).

Serverless RL: Improve model reliability performing multi-turn, agentic tasks while increasing speed and reducing costs. RL is a training technique where models learn to improve their behavior through feedback on their outputs.
Serverless SFT: Fine-tune models using curated datasets for distillation, teaching output style and format, or warming up before RL.

W&B Training includes integration with:

ART, a flexible fine-tuning framework.
RULER, a universal verifier.
A fully-managed backend on CoreWeave Cloud.

To get started, satisfy the prerequisites to start using the service and then see the Serverless RL quickstart or the Serverless SFT docs to learn how to post-train your models.

Link last verified June 7, 2026. View original ↗

Source: Weights & Biases Docs

Link last verified: 2026-04-05