TrlAsynchronous GRPO (0)BCO Trainer (0)BEMA for Reference Model (0)Callbacks (0)Chat template utilities (0)Chat Templates (0)Command Line Interfaces (CLIs) (0)Community Tutorials (0)CPO Trainer (0)Data Utilities (0)Dataset formats and types (0)DeepSpeed Integration (0)Distillation Trainer (0)Distributing Training (0)DPO Trainer (0)Examples (0)Experimental (0)General Online Logit Distillation (GOLD) Trainer (0)Generalized Knowledge Distillation Trainer (0)GFPO (0)GRPO Trainer (0)GRPO With Replay Buffer (0)GSPO-token (0)Installation (0)Kernels Hub Integration and Usage (0)KTO Trainer (0)Liger Kernel Integration (0)LoRA Without Regret (0)MiniLLM Trainer (0)Nash-MD Trainer (0)NeMo Gym Integration (0)Online DPO Trainer (0)OpenEnv Integration for Training LLMs with Environments (0)OpenReward Integration for Training LLMs with Environments (0)ORPO Trainer (0)Paper Index (0)PAPO Trainer (0)PEFT Integration (0)Post-Training Toolkit Integration (0)PPO Trainer (0)PRM Trainer (0)Quickstart (0)RapidFire AI Integration (0)Reducing Memory Usage (0)Reward Functions (0)Reward Modeling (0)RLOO Trainer (0)Scripts Utilities (0)SDFT (0)SDPO (0)SFT Trainer (0)Speeding Up Training (0)SSD (0)TPO Trainer (0)Trackio Integration (0)Training customization (0)Training with Jobs (0)TRL - Transformers Reinforcement Learning (0)Unsloth Integration (0)Usage Stats Collection (0)Use model after training (0)vLLM Integration (0)XPO Trainer (0)