TRL#
Documentation from TRL — Hugging Face’s Transformer Reinforcement Learning library.
Post-training techniques for language models including supervised fine-tuning (SFT), reward modeling, PPO, DPO, and GRPO — the toolkit behind aligning and instruction-tuning open models.