KTO Trainer

no

Original Documentation

model badge

As of TRL v1.0, KTOTrainer and KTOConfig have been moved to the trl.experimental.kto module.
KTO API is experimental and may change at any time. Promoting KTO back into the stable API is a high-priority task: KTO is slated for refactoring to align with the standard core trainer architecture.

Overview#

Kahneman-Tversky Optimization (KTO) was introduced in KTO: Model Alignment as Prospect Theoretic Optimization by Kawin Ethayarajh, Winnie Xu, Niklas Muennighoff, Dan Jurafsky, Douwe Kiela.

The abstract from the paper is the following:

Kahneman & Tversky’s prospect theory tells us that humans perceive random variables in a biased but well-defined manner; for example, humans are famously loss-averse. We show that objectives for aligning LLMs with human feedback implicitly incorporate many of these biases – the success of these objectives (e.g., DPO) over cross-entropy minimization can partly be ascribed to them being human-aware loss functions (HALOs). However, the utility functions these methods attribute to humans still differ from those in the prospect theory literature. Using a Kahneman-Tversky model of human utility, we propose a HALO that directly maximizes the utility of generations instead of maximizing the log-likelihood of preferences, as current methods do. We call this approach Kahneman-Tversky Optimization (KTO), and it matches or exceeds the performance of preference-based methods at scales from 1B to 30B. Crucially, KTO does not need preferences – only a binary signal of whether an output is desirable or undesirable for a given input. This makes it far easier to use in the real world, where preference data is scarce and expensive.

The official code can be found in ContextualAI/HALOs.

This post-training method was contributed by Kashif Rasul, Younes Belkada, Lewis Tunstall and Pablo Vicente.

Quick start#

This example demonstrates how to train a model using the KTO method. We use the Qwen 0.5B model as the base model. We use the preference data from the KTO Mix 14k. You can view the data in the dataset here: