Examples

no

Original Documentation

This directory contains a collection of examples that demonstrate how to use the TRL library for various applications. We provide both scripts for advanced use cases and notebooks for an easy start and interactive experimentation.

The notebooks are self-contained and can run on free Colab, while the scripts can run on single GPU, multi-GPU, or DeepSpeed setups.

Getting Started

Install TRL and additional dependencies as follows:

pip install --upgrade trl[quantization]

Check for additional optional dependencies here.

For scripts, you will also need an 🤗 Accelerate config (recommended for multi-gpu settings):

accelerate config # will prompt you to define the training configuration

This allows you to run scripts with accelerate launch in single or multi-GPU settings.

Notebooks#

These notebooks are easier to run and are designed for quick experimentation with TRL. The list of notebooks can be found in the trl/examples/notebooks/ directory.

NotebookDescriptionOpen in Colab
grpo_trl_lora_qlora.ipynbGRPO using QLoRA on free ColabOpen In Colab
grpo_agent.ipynbGRPO for agent trainingNot available due to OOM with Colab GPUs
grpo_rnj_1_instruct.ipynbGRPO rnj-1-instruct with QLoRA using TRL on Colab to add reasoning capabilitiesOpen In Colab
sft_ministral3_vl.ipynbSupervised Fine-Tuning (SFT) Ministral 3 with QLoRA using TRL on free ColabOpen In Colab
grpo_ministral3_vl.ipynbGRPO Ministral 3 with QLoRA using TRL on free ColabOpen In Colab
sft_nemotron_3.ipynbSFT with LoRA on NVIDIA Nemotron 3 modelsOpen In Colab
sft_trl_lora_qlora.ipynbSupervised Fine-Tuning (SFT) using QLoRA on free ColabOpen In Colab
sft_qwen_vl.ipynbSupervised Fine-Tuning (SFT) Qwen3-VL with QLoRA using TRL on free ColabOpen In Colab
sft_tool_calling.ipynbTeaching tool calling to a model without native tool-calling support using SFT with QLoRAOpen In Colab
grpo_qwen3_vl.ipynbGRPO Qwen3-VL with QLoRA using TRL on free ColabOpen In Colab

OpenEnv Notebooks#

These notebooks demonstrate how to train models with OpenEnv environments using GRPOTrainer’s environment_factory. The BrowserGym notebook uses the lower-level rollout_func API instead. See the OpenEnv Integration guide for more details.

NotebookDescriptionOpen in Colab
openenv_wordle_grpo.ipynbGRPO to play Wordle on an OpenEnv environmentOpen In Colab
openenv_sudoku_grpo.ipynbGRPO to play Sudoku on an OpenEnv environmentOpen In Colab
grpo_functiongemma_browsergym_openenv.ipynbGRPO on FunctionGemma in the BrowserGym environmentOpen In Colab

Scripts#

Scripts are maintained in the trl/scripts and examples/scripts directories. They show how to use different trainers such as SFTTrainer, PPOTrainer, DPOTrainer, GRPOTrainer, and more.

FileDescription
examples/scripts/bco.pyThis script shows how to use the experimental.kto.KTOTrainer with the BCO loss to fine-tune a model to increase instruction-following, truthfulness, honesty, and helpfulness using the openbmb/UltraFeedback dataset.
examples/scripts/cpo.pyThis script shows how to use the experimental.cpo.CPOTrainer to fine-tune a model to increase helpfulness and harmlessness using the Anthropic/hh-rlhf dataset.
trl/scripts/dpo.pyThis script shows how to use the DPOTrainer to fine-tune a model.
examples/scripts/dpo_vlm.pyThis script shows how to use the DPOTrainer to fine-tune a Vision Language Model to reduce hallucinations using the openbmb/RLAIF-V-Dataset dataset.
examples/scripts/gkd.pyThis script shows how to use the experimental.gkd.GKDTrainer to fine-tune a model.
trl/scripts/grpo.pyThis script shows how to use the GRPOTrainer to fine-tune a model.
trl/scripts/grpo_agent.pyThis script shows how to use the GRPOTrainer to fine-tune a model to enable agentic usage.
examples/scripts/grpo_vlm.pyThis script shows how to use the GRPOTrainer to fine-tune a multimodal model for reasoning using the lmms-lab/multimodal-open-r1-8k-verified dataset.
examples/scripts/gspo.pyThis script shows how to use GSPO via the GRPOTrainer to fine-tune model for reasoning using the AI-MO/NuminaMath-TIR dataset.
examples/scripts/gspo_vlm.pyThis script shows how to use GSPO via the GRPOTrainer to fine-tune a multimodal model for reasoning using the lmms-lab/multimodal-open-r1-8k-verified dataset.
examples/scripts/kto.pyThis script shows how to use the experimental.kto.KTOTrainer to fine-tune a model.
examples/scripts/mpo_vlm.pyThis script shows how to use MPO via the DPOTrainer to align a model based on preferences using the HuggingFaceH4/rlaif-v_formatted dataset and a set of loss weights with weights.
examples/scripts/nash_md.pyThis script shows how to use the experimental.nash_md.NashMDTrainer to fine-tune a model.
examples/scripts/nemo_gym/train_multi_environment.pyThis script shows how to use the GRPOTrainer to train language models in NVIDIA NeMo-Gym environments. Supports multi-turn and tool calling environments, and multi-environment training. See the NeMo-Gym Integration guide for setup and usage.
examples/scripts/online_dpo.pyThis script shows how to use the experimental.online_dpo.OnlineDPOTrainer to fine-tune a model.
examples/scripts/online_dpo_vlm.pyThis script shows how to use the experimental.online_dpo.OnlineDPOTrainer to fine-tune a a Vision Language Model.
examples/scripts/orpo.pyThis script shows how to use the experimental.orpo.ORPOTrainer to fine-tune a model to increase helpfulness and harmlessness using the Anthropic/hh-rlhf dataset.
examples/scripts/openreward/seta.pyThis script shows how to use the GRPOTrainer to train a model against the SETA ORS environment on the openreward.ai catalog. See the OpenReward Integration guide for setup and usage.
examples/scripts/ppo/ppo.pyThis script shows how to use the experimental.ppo.PPOTrainer to fine-tune a model to improve its ability to continue text with positive sentiment or physically descriptive language.
examples/scripts/ppo/ppo_tldr.pyThis script shows how to use the experimental.ppo.PPOTrainer to fine-tune a model to improve its ability to generate TL;DR summaries.
examples/scripts/prm.pyThis script shows how to use the experimental.prm.PRMTrainer to fine-tune a Process-supervised Reward Model (PRM).
examples/scripts/reward_modeling.pyThis script shows how to use the RewardTrainer to train an Outcome Reward Model (ORM) on your own dataset.
examples/scripts/rloo.pyThis script shows how to use the RLOOTrainer to fine-tune a model to improve its ability to solve math questions.
trl/scripts/sft.pyThis script shows how to use the SFTTrainer to fine-tune a model.
examples/scripts/sft_gemma3.pyThis script shows how to use the SFTTrainer to fine-tune a Gemma 3 model.
examples/scripts/sft_nemotron_3.pyThis script shows how to use the SFTTrainer to fine-tune an NVIDIA Nemotron 3 model.
examples/scripts/sft_tiny_aya_tool_calling.pyThis script shows how to use the SFTTrainer to teach tool calling to a model without native tool-calling support using the bebechien/SimpleToolCalling dataset.
examples/scripts/sft_video_llm.pyThis script shows how to use the SFTTrainer to fine-tune a Video Language Model.
examples/scripts/sft_vlm.pyThis script shows how to use the SFTTrainer to fine-tune a Vision Language Model in a chat setting. The script has only been tested with LLaVA 1.5, LLaVA 1.6, and Llama-3.2-11B-Vision-Instruct models, so users may see unexpected behaviour in other model architectures.
examples/scripts/sft_vlm_gemma3.pyThis script shows how to use the SFTTrainer to fine-tune a Gemma 3 model on vision to text tasks.
examples/scripts/sft_vlm_smol_vlm.pyThis script shows how to use the SFTTrainer to fine-tune a SmolVLM model.
examples/scripts/xpo.pyThis script shows how to use the experimental.xpo.XPOTrainer to fine-tune a model.

OpenEnv Scripts#

These scripts demonstrate how to train models with OpenEnv environments using GRPOTrainer’s environment_factory. See the OpenEnv Integration guide for more details.

FileDescription
examples/scripts/openenv/echo.pyGRPO training with the Echo environment (minimal example).
examples/scripts/openenv/wordle.pyGRPO training with the Wordle (TextArena) environment.
examples/scripts/openenv/catch.pyGRPO training with the Catch (OpenSpiel) environment.
examples/scripts/openenv/sudoku.pyGRPO training with the Sudoku environment.
examples/scripts/openenv/multi_env.pyMulti-environment GRPO training: Wordle + Catch in the same training run.
examples/scripts/openenv/browsergym.pyGRPO training with the BrowserGym environment for VLMs.
examples/scripts/openenv/browsergym_llm.pyGRPO training with the BrowserGym environment for LLMs.
examples/scripts/openenv/carla.pyGRPO training with the CARLA environment for autonomous driving.
examples/scripts/openenv/carla_vlm.pyGRPO training with CARLA for VLMs with multimodal tool responses (camera images).
examples/scripts/openenv/carla_vlm_gemma.pyGRPO training with CARLA for Gemma 4 with multimodal tool responses (camera images).

Distributed Training (for scripts)#

You can run scripts on multiple GPUs with 🤗 Accelerate:

accelerate launch --config_file=examples/accelerate_configs/multi_gpu.yaml --num_processes {NUM_GPUS} path_to_script.py --all_arguments_of_the_script

For DeepSpeed ZeRO-{1,2,3}:

accelerate launch --config_file=examples/accelerate_configs/deepspeed_zero{1,2,3}.yaml --num_processes {NUM_GPUS} path_to_script.py --all_arguments_of_the_script

Adjust NUM_GPUS and --all_arguments_of_the_script as needed.

Link last verified June 7, 2026. View original ↗
Source: TRL Docs
Link last verified: 2026-06-07