Scripts Utilities ↗

Original Documentation

ScriptArguments[[trl.ScriptArguments]]#

trl.ScriptArguments[[trl.ScriptArguments]]#

Arguments common to all scripts.

Parameters:

dataset_name (str,, optional) : Path or name of the dataset to load. If datasets is provided, this will be ignored.

dataset_config (str, optional) : Dataset configuration name. Corresponds to the name argument of the load_dataset function. If datasets is provided, this will be ignored.

dataset_train_split (str, optional, defaults to "train") : Dataset split to use for training. If datasets is provided, this will be ignored.

dataset_test_split (str, optional, defaults to "test") : Dataset split to use for evaluation. If datasets is provided, this will be ignored.

dataset_streaming (bool, optional, defaults to False) : Whether to stream the dataset. If True, the dataset will be loaded in streaming mode. If datasets is provided, this will be ignored.

ignore_bias_buffers (bool, optional, defaults to False) : Debug argument for distributed training. Fix for DDP issues with LM bias/mask buffers - invalid scalar type, inplace operation. See https://github.com/huggingface/transformers/issues/22482#issuecomment-1595790992.

TrlParser[[trl.TrlParser]]#

trl.TrlParser[[trl.TrlParser]]#

Source

A subclass of transformers.HfArgumentParser designed for parsing command-line arguments with dataclass-backed configurations, while also supporting configuration file loading and environment variable management.

Examples:

# config.yaml
env:
    VAR1: value1
arg1: 23

# main.py
import os
from dataclasses import dataclass
from trl import TrlParser

@dataclass
class MyArguments:
    arg1: int
    arg2: str = "alpha"

parser = TrlParser(dataclass_types=[MyArguments])
training_args = parser.parse_args_and_config()

print(training_args, os.environ.get("VAR1"))

$ python main.py --config config.yaml
(MyArguments(arg1=23, arg2='alpha'),) value1

$ python main.py --arg1 5 --arg2 beta
(MyArguments(arg1=5, arg2='beta'),) None

parse_args_and_configtrl.TrlParser.parse_args_and_confighttps://github.com/huggingface/trl/blob/v1.5.1/trl/scripts/utils.py#L295[{“name”: “args”, “val”: “: collections.abc.Iterable[str] | None = None”}, {“name”: “return_remaining_strings”, “val”: “: bool = False”}, {“name”: “fail_with_unknown_args”, “val”: “: bool = True”}, {“name”: “separate_remaining_strings”, “val”: “: bool = False”}]

Parse command-line args and config file into instances of the specified dataclass types.

This method wraps transformers.HfArgumentParser.parse_args_into_dataclasses and also parses the config file specified with the --config flag. The config file (in YAML format) provides argument values that replace the default values in the dataclasses. Command line arguments can override values set by the config file. The method also sets any environment variables specified in the env field of the config file.

Parameters:

dataclass_types (DataClassType | Iterable[DataClassType], optional) : Dataclass types to use for argument parsing.

**kwargs : Additional keyword arguments passed to the transformers.HfArgumentParser constructor.

parse_args_into_dataclasses[[trl.TrlParser.parse_args_into_dataclasses]]#

Source

Parse command-line args into instances of the specified dataclass types.

This relies on argparse’s ArgumentParser.parse_known_args. See the doc at: docs.python.org/3/library/argparse.html#argparse.ArgumentParser.parse_args

Parameters:

args : List of strings to parse. The default is taken from sys.argv. (same as argparse.ArgumentParser)

return_remaining_strings : If true, also return a list of remaining argument strings.

look_for_args_file : If true, will look for a “.args” file with the same base name as the entry point script for this process, and will append its potential content to the command line args.

args_filename : If not None, will uses this file instead of the “.args” file specified in the previous argument.

args_file_flag : If not None, will look for a file in the command-line args specified with this flag. The flag can be specified multiple times and precedence is determined by the order (last one wins).

Returns:

Tuple consisting of

the dataclass instances in the same order as they were passed to the initializer.abspath
if applicable, an additional namespace for more (non-dataclass backed) arguments added to the parser after initialization.
The potential list of remaining argument strings. (same as argparse.ArgumentParser.parse_known_args)

set_defaults_with_config[[trl.TrlParser.set_defaults_with_config]]#

Source

Overrides the parser’s default values with those provided via keyword arguments, including for subparsers.

Any argument with an updated default will also be marked as not required if it was previously required.

Returns a list of strings that were not consumed by the parser.

get_dataset[[trl.get_dataset]]#

trl.get_dataset[[trl.get_dataset]]#

Source

Load a mixture of datasets based on the configuration.

Example:

from trl import DatasetMixtureConfig, get_dataset
from trl.scripts.utils import DatasetConfig

mixture_config = DatasetMixtureConfig(datasets=[DatasetConfig(path="trl-lib/tldr")])
dataset = get_dataset(mixture_config)
print(dataset)

DatasetDict({
    train: Dataset({
        features: ['prompt', 'completion'],
        num_rows: 116722
    })
})

Parameters:

mixture_config (DatasetMixtureConfig) : Script arguments containing dataset configuration.

Returns:

DatasetDict

Combined dataset(s) from the mixture configuration, with optional train/test split if test_split_size is set.

DatasetConfig[[trl.scripts.utils.DatasetConfig]]#

trl.scripts.utils.DatasetConfig[[trl.scripts.utils.DatasetConfig]]#

Source

Configuration for a dataset.

This class matches the signature of load_dataset and the arguments are used directly in the load_dataset function. You can refer to the load_dataset documentation for more details.

Parameters:

path (str) : Path or name of the dataset.

name (str, optional) : Defining the name of the dataset configuration.

data_dir (str, optional) : Defining the data_dir of the dataset configuration. If specified for the generic builders(csv, text etc.) or the Hub datasets and data_files is None, the behavior is equal to passing os.path.join(data_dir, **) as data_files to reference all the files in a directory.

data_files (str or Sequence or Mapping, optional) : Path(s) to source data file(s).

split (str, optional, defaults to "train") : Which split of the data to load.

columns (list[str], optional) : List of column names to select from the dataset. If None, all columns are selected.

DatasetMixtureConfig[[trl.DatasetMixtureConfig]]#

trl.DatasetMixtureConfig[[trl.DatasetMixtureConfig]]#

Source

Configuration class for a mixture of datasets.

Using HfArgumentParser we can turn this class into argparse arguments that can be specified on the command line.

Usage:

When using the CLI, you can add the following section to your YAML config file:

datasets:
- path: ...
    name: ...
    data_dir: ...
    data_files: ...
    split: ...
    columns: ...
- path: ...
    name: ...
    data_dir: ...
    data_files: ...
    split: ...
    columns: ...
streaming: ...
test_split_size: ...

Parameters:

datasets (list[DatasetConfig]) : List of dataset configurations to include in the mixture.

streaming (bool, optional, defaults to False) : Whether to stream the datasets. If True, the datasets will be loaded in streaming mode.

test_split_size (float, optional) : Size of the test split. Refer to the test_size parameter in the train_test_split function for more details. If None, the dataset will not be split into train and test sets.

Link last verified June 7, 2026. View original ↗

Source: TRL Docs

Link last verified: 2026-06-07