Command Line Interfaces (CLIs) ↗
noOriginal Documentation
TRL provides a powerful command-line interface (CLI) to fine-tune large language models (LLMs) using methods like Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and more. The CLI abstracts away much of the boilerplate, letting you launch training jobs quickly and reproducibly.
Commands#
Currently supported commands are:
Training Commands#
trl dpo: fine-tune a LLM with DPOtrl grpo: fine-tune a LLM with GRPOtrl kto: fine-tune a LLM with KTOtrl reward: train a Reward Modeltrl rloo: fine-tune a LLM with RLOOtrl sft: fine-tune a LLM with SFT
Other Commands#
trl env: get the system informationtrl vllm-serve: serve a model with vLLM
Fine-Tuning with the TRL CLI#
Basic Usage#
You can launch training directly from the CLI by specifying required arguments like the model and dataset:
trl sft \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name stanfordnlp/imdbtrl dpo \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name anthropic/hh-rlhftrl reward \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name trl-lib/ultrafeedback_binarizedtrl grpo \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name HuggingFaceH4/Polaris-Dataset-53K \
--reward_funcs accuracy_rewardtrl rloo \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name HuggingFaceH4/Polaris-Dataset-53K \
--reward_funcs accuracy_rewardtrl kto \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name trl-lib/kto-mix-14kUsing Configuration Files#
To keep your CLI commands clean and reproducible, you can define all training arguments in a YAML configuration file:
# sft_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: stanfordnlp/imdbLaunch with:
trl sft --config sft_config.yaml# dpo_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: anthropic/hh-rlhfLaunch with:
trl dpo --config dpo_config.yaml# reward_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: trl-lib/ultrafeedback_binarizedLaunch with:
trl reward --config reward_config.yaml# grpo_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: HuggingFaceH4/Polaris-Dataset-53K
reward_funcs:
- accuracy_rewardLaunch with:
trl grpo --config grpo_config.yaml# rloo_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: HuggingFaceH4/Polaris-Dataset-53K
reward_funcs:
- accuracy_rewardLaunch with:
trl rloo --config rloo_config.yaml# kto_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: trl-lib/kto-mix-14kLaunch with:
trl kto --config kto_config.yamlScaling Up with Accelerate#
TRL CLI natively supports 🤗 Accelerate, making it easy to scale training across multiple GPUs, machines, or use advanced setups like DeepSpeed — all from the same CLI.
You can pass any accelerate launch arguments directly to trl, such as --num_processes. For more information see Using accelerate launch.
trl sft \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name stanfordnlp/imdb \
--num_processes 4or, with a config file:
# sft_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: stanfordnlp/imdb
num_processes: 4Launch with:
trl sft --config sft_config.yamltrl dpo \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name anthropic/hh-rlhf \
--num_processes 4or, with a config file:
# dpo_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: anthropic/hh-rlhf
num_processes: 4Launch with:
trl dpo --config dpo_config.yamltrl reward \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name trl-lib/ultrafeedback_binarized \
--num_processes 4or, with a config file:
# reward_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: trl-lib/ultrafeedback_binarized
num_processes: 4Launch with:
trl reward --config reward_config.yamltrl grpo \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name HuggingFaceH4/Polaris-Dataset-53K \
--reward_funcs accuracy_reward \
--num_processes 4or, with a config file:
# grpo_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: HuggingFaceH4/Polaris-Dataset-53K
reward_funcs:
- accuracy_reward
num_processes: 4Launch with:
trl grpo --config grpo_config.yamltrl rloo \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name HuggingFaceH4/Polaris-Dataset-53K \
--reward_funcs accuracy_reward \
--num_processes 4or, with a config file:
# rloo_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: HuggingFaceH4/Polaris-Dataset-53K
reward_funcs:
- accuracy_reward
num_processes: 4Launch with:
trl rloo --config rloo_config.yamltrl kto \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name trl-lib/kto-mix-14k \
--num_processes 4or, with a config file:
# kto_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: trl-lib/kto-mix-14k
num_processes: 4Launch with:
trl kto --config kto_config.yamlUsing --accelerate_config for Accelerate Configuration#
The --accelerate_config flag lets you easily configure distributed training with 🤗 Accelerate. This flag accepts either:
- the name of a predefined config profile (built into TRL), or
- a path to a custom Accelerate YAML config file.
Predefined Config Profiles#
TRL provides several ready-to-use Accelerate configs to simplify common training setups:
| Name | Description |
|---|---|
fsdp1 | Fully Sharded Data Parallel Stage 1 |
fsdp2 | Fully Sharded Data Parallel Stage 2 |
zero1 | DeepSpeed ZeRO Stage 1 |
zero2 | DeepSpeed ZeRO Stage 2 |
zero3 | DeepSpeed ZeRO Stage 3 |
multi_gpu | Multi-GPU training |
single_gpu | Single-GPU training |
To use one of these, just pass the name to --accelerate_config. TRL will automatically load the corresponding config file from trl/accelerate_config/.
Example Usage#
trl sft \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name stanfordnlp/imdb \
--accelerate_config zero2 # or path/to/my/accelerate/config.yamlor, with a config file:
# sft_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: stanfordnlp/imdb
accelerate_config: zero2 # or path/to/my/accelerate/config.yamlLaunch with:
trl sft --config sft_config.yamltrl dpo \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name anthropic/hh-rlhf \
--accelerate_config zero2 # or path/to/my/accelerate/config.yamlor, with a config file:
# dpo_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: anthropic/hh-rlhf
accelerate_config: zero2 # or path/to/my/accelerate/config.yamlLaunch with:
trl dpo --config dpo_config.yamltrl reward \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name trl-lib/ultrafeedback_binarized \
--accelerate_config zero2 # or path/to/my/accelerate/config.yamlor, with a config file:
# reward_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: trl-lib/ultrafeedback_binarized
accelerate_config: zero2 # or path/to/my/accelerate/config.yamlLaunch with:
trl reward --config reward_config.yamltrl grpo \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name HuggingFaceH4/Polaris-Dataset-53K \
--reward_funcs accuracy_reward \
--accelerate_config zero2 # or path/to/my/accelerate/config.yamlor, with a config file:
# grpo_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: HuggingFaceH4/Polaris-Dataset-53K
reward_funcs:
- accuracy_reward
accelerate_config: zero2 # or path/to/my/accelerate/config.yamlLaunch with:
trl grpo --config grpo_config.yamltrl rloo \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name HuggingFaceH4/Polaris-Dataset-53K \
--reward_funcs accuracy_reward \
--accelerate_config zero2 # or path/to/my/accelerate/config.yamlor, with a config file:
# rloo_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: HuggingFaceH4/Polaris-Dataset-53K
reward_funcs:
- accuracy_reward
accelerate_config: zero2 # or path/to/my/accelerate/config.yamlLaunch with:
trl rloo --config rloo_config.yamltrl kto \
--model_name_or_path Qwen/Qwen2.5-0.5B \
--dataset_name trl-lib/kto-mix-14k \
--accelerate_config zero2 # or path/to/my/accelerate/config.yamlor, with a config file:
# kto_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
dataset_name: trl-lib/kto-mix-14k
accelerate_config: zero2 # or path/to/my/accelerate/config.yamlLaunch with:
trl kto --config kto_config.yamlUsing dataset mixtures#
You can use dataset mixtures to combine multiple datasets into a single training dataset. This is useful for training on diverse data sources or when you want to mix different types of data.
# sft_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
datasets:
- path: stanfordnlp/imdb
- path: roneneldan/TinyStoriesLaunch with:
trl sft --config sft_config.yaml# dpo_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
datasets:
- path: BAAI/Infinity-Preference
- path: argilla/Capybara-PreferencesLaunch with:
trl dpo --config dpo_config.yaml# reward_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
datasets:
- path: trl-lib/tldr-preference
- path: trl-lib/lm-human-preferences-sentimentLaunch with:
trl reward --config reward_config.yaml# grpo_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
datasets:
- path: HuggingFaceH4/Polaris-Dataset-53K
- path: trl-lib/DeepMath-103K
reward_funcs:
- accuracy_rewardLaunch with:
trl grpo --config grpo_config.yaml# rloo_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
datasets:
- path: HuggingFaceH4/Polaris-Dataset-53K
- path: trl-lib/DeepMath-103K
reward_funcs:
- accuracy_rewardLaunch with:
trl rloo --config rloo_config.yaml# kto_config.yaml
model_name_or_path: Qwen/Qwen2.5-0.5B
datasets:
- path: trl-lib/kto-mix-14k
- path: argilla/ultrafeedback-binarized-preferences-cleanedLaunch with:
trl kto --config kto_config.yamlTo see all the available keywords for defining dataset mixtures, refer to the scripts.utils.DatasetConfig and DatasetMixtureConfig classes.
Getting the System Information#
You can get the system information by running the following command:
trl envThis will print out the system information, including the GPU information, the CUDA version, the PyTorch version, the transformers version, the TRL version, and any optional dependencies that are installed.
Copy-paste the following information when reporting an issue:
- Platform: Linux-5.15.0-1048-aws-x86_64-with-glibc2.31
- Python version: 3.11.9
- PyTorch version: 2.4.1
- accelerator(s): NVIDIA H100 80GB HBM3
- Transformers version: 4.45.0.dev0
- Accelerate version: 0.34.2
- Accelerate config:
- compute_environment: LOCAL_MACHINE
- distributed_type: DEEPSPEED
- mixed_precision: no
- use_cpu: False
- debug: False
- num_processes: 4
- machine_rank: 0
- num_machines: 1
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: False
- deepspeed_config: {'gradient_accumulation_steps': 4, 'offload_optimizer_device': 'none', 'offload_param_device': 'none', 'zero3_init_flag': False, 'zero_stage': 2}
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
- Datasets version: 3.0.0
- HF Hub version: 0.24.7
- TRL version: 0.12.0.dev0+acb4d70
- bitsandbytes version: 0.41.1
- DeepSpeed version: 0.15.1
- Diffusers version: 0.30.3
- Liger-Kernel version: 0.3.0
- LLM-Blender version: 0.0.2
- OpenAI version: 1.46.0
- PEFT version: 0.12.0
- vLLM version: not installedThis information is required when reporting an issue.