TRL#

Documentation from TRL — Hugging Face’s Transformer Reinforcement Learning library.

Post-training techniques for language models including supervised fine-tuning (SFT), reward modeling, PPO, DPO, and GRPO — the toolkit behind aligning and instruction-tuning open models.