Community Tutorials ↗

Original Documentation

Community tutorials are made by active members of the Hugging Face community who want to share their knowledge and expertise with others. They are a great way to learn about the library and its features, and to get started with core classes and modalities.

Language Models#

Tutorials#

Task	Class	Description	Author	Tutorial
Reinforcement Learning	GRPOTrainer	Efficient Online Training with GRPO and vLLM in TRL	Sergio Paniego	Link
Reinforcement Learning	GRPOTrainer	Post training an LLM for reasoning with GRPO in TRL	Sergio Paniego	Link
Reinforcement Learning	GRPOTrainer	Mini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorial	Philipp Schmid	Link
Reinforcement Learning	GRPOTrainer	RL on LLaMA 3.1-8B with GRPO and Unsloth optimizations	Andrea Manzoni	Link
Instruction tuning	SFTTrainer	Fine-tuning Google Gemma LLMs using ChatML format with QLoRA	Philipp Schmid	Link
Structured Generation	SFTTrainer	Fine-tuning Llama-2-7B to generate Persian product catalogs in JSON using QLoRA and PEFT	Mohammadreza Esmaeilian	Link
Preference Optimization	DPOTrainer	Align Mistral-7b using Direct Preference Optimization for human preference alignment	Maxime Labonne	Link
Preference Optimization	experimental.orpo.ORPOTrainer	Fine-tuning Llama 3 with ORPO combining instruction tuning and preference alignment	Maxime Labonne	Link
Instruction tuning	SFTTrainer	How to fine-tune open LLMs in 2025 with Hugging Face	Philipp Schmid	Link
Step-Level Reasoning	GRPOTrainer	Supervised Reinforcement Learning (SRL) for step-by-step reasoning with vLLM	Deepak Swaminathan	Link

Videos#

Task	Title	Author	Video
Instruction tuning	Fine-tuning open AI models using Hugging Face TRL	Wietse Venema
Instruction tuning	How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset	Mayurji

⚠️ Deprecated features notice for “How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset” (click to expand)

The tutorial uses two deprecated features:
SFTTrainer(..., tokenizer=tokenizer): Use SFTTrainer(..., processing_class=tokenizer) instead, or simply omit it (it will be inferred from the model).
setup_chat_format(model, tokenizer): Use SFTConfig(..., chat_template_path="Qwen/Qwen3-0.6B"), where chat_template_path specifies the model whose chat template you want to copy.

Vision Language Models#

Tutorials#

Task	Class	Description	Author	Tutorial
Visual QA	SFTTrainer	Fine-tuning Qwen2-VL-7B for visual question answering on ChartQA dataset	Sergio Paniego	Link
Visual QA	SFTTrainer	Fine-tuning SmolVLM with TRL on a consumer GPU	Sergio Paniego	Link
SEO Description	SFTTrainer	Fine-tuning Qwen2-VL-7B for generating SEO-friendly descriptions from images	Philipp Schmid	Link
Visual QA	DPOTrainer	PaliGemma 🤝 Direct Preference Optimization	Merve Noyan	Link
Visual QA	DPOTrainer	Fine-tuning SmolVLM using direct preference optimization (DPO) with TRL on a consumer GPU	Sergio Paniego	Link
Object Detection Grounding	SFTTrainer	Fine tuning a VLM for Object Detection Grounding using TRL	Sergio Paniego	Link
Visual QA	DPOTrainer	Fine-Tuning a Vision Language Model with TRL using MPO	Sergio Paniego	Link
Reinforcement Learning	GRPOTrainer	Post training a VLM for reasoning with GRPO using TRL	Sergio Paniego	Link

Speech Language Models#

Tutorials#

Task	Class	Description	Author	Tutorial
Text-to-Speech	GRPOTrainer	Post training a Speech Language Model with GRPO using TRL	Steven Zheng	Link

Contributing#

If you have a tutorial that you would like to add to this list, please open a PR to add it. We will review it and merge it if it is relevant to the community.

Link last verified June 7, 2026. View original ↗

Source: TRL Docs

Link last verified: 2026-06-07