Community Tutorials

no

Original Documentation

Community tutorials are made by active members of the Hugging Face community who want to share their knowledge and expertise with others. They are a great way to learn about the library and its features, and to get started with core classes and modalities.

Language Models#

Tutorials#

TaskClassDescriptionAuthorTutorialColab
Reinforcement LearningGRPOTrainerEfficient Online Training with GRPO and vLLM in TRLSergio PaniegoLinkOpen In Colab
Reinforcement LearningGRPOTrainerPost training an LLM for reasoning with GRPO in TRLSergio PaniegoLinkOpen In Colab
Reinforcement LearningGRPOTrainerMini-R1: Reproduce Deepseek R1 „aha moment“ a RL tutorialPhilipp SchmidLinkOpen In Colab
Reinforcement LearningGRPOTrainerRL on LLaMA 3.1-8B with GRPO and Unsloth optimizationsAndrea ManzoniLinkOpen In Colab
Instruction tuningSFTTrainerFine-tuning Google Gemma LLMs using ChatML format with QLoRAPhilipp SchmidLinkOpen In Colab
Structured GenerationSFTTrainerFine-tuning Llama-2-7B to generate Persian product catalogs in JSON using QLoRA and PEFTMohammadreza EsmaeilianLinkOpen In Colab
Preference OptimizationDPOTrainerAlign Mistral-7b using Direct Preference Optimization for human preference alignmentMaxime LabonneLinkOpen In Colab
Preference Optimizationexperimental.orpo.ORPOTrainerFine-tuning Llama 3 with ORPO combining instruction tuning and preference alignmentMaxime LabonneLinkOpen In Colab
Instruction tuningSFTTrainerHow to fine-tune open LLMs in 2025 with Hugging FacePhilipp SchmidLinkOpen In Colab
Step-Level ReasoningGRPOTrainerSupervised Reinforcement Learning (SRL) for step-by-step reasoning with vLLMDeepak SwaminathanLinkOpen In Colab

Videos#

TaskTitleAuthorVideo
Instruction tuningFine-tuning open AI models using Hugging Face TRLWietse Venema
Instruction tuningHow to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk DatasetMayurji

⚠️ Deprecated features notice for “How to fine-tune a smol-LM with Hugging Face, TRL, and the smoltalk Dataset” (click to expand)

The tutorial uses two deprecated features:

  • SFTTrainer(..., tokenizer=tokenizer): Use SFTTrainer(..., processing_class=tokenizer) instead, or simply omit it (it will be inferred from the model).
  • setup_chat_format(model, tokenizer): Use SFTConfig(..., chat_template_path="Qwen/Qwen3-0.6B"), where chat_template_path specifies the model whose chat template you want to copy.

Vision Language Models#

Tutorials#

TaskClassDescriptionAuthorTutorialColab
Visual QASFTTrainerFine-tuning Qwen2-VL-7B for visual question answering on ChartQA datasetSergio PaniegoLinkOpen In Colab
Visual QASFTTrainerFine-tuning SmolVLM with TRL on a consumer GPUSergio PaniegoLinkOpen In Colab
SEO DescriptionSFTTrainerFine-tuning Qwen2-VL-7B for generating SEO-friendly descriptions from imagesPhilipp SchmidLinkOpen In Colab
Visual QADPOTrainerPaliGemma 🤝 Direct Preference OptimizationMerve NoyanLinkOpen In Colab
Visual QADPOTrainerFine-tuning SmolVLM using direct preference optimization (DPO) with TRL on a consumer GPUSergio PaniegoLinkOpen In Colab
Object Detection GroundingSFTTrainerFine tuning a VLM for Object Detection Grounding using TRLSergio PaniegoLinkOpen In Colab
Visual QADPOTrainerFine-Tuning a Vision Language Model with TRL using MPOSergio PaniegoLinkOpen In Colab
Reinforcement LearningGRPOTrainerPost training a VLM for reasoning with GRPO using TRLSergio PaniegoLinkOpen In Colab

Speech Language Models#

Tutorials#

TaskClassDescriptionAuthorTutorial
Text-to-SpeechGRPOTrainerPost training a Speech Language Model with GRPO using TRLSteven ZhengLink

Contributing#

If you have a tutorial that you would like to add to this list, please open a PR to add it. We will review it and merge it if it is relevant to the community.

Link last verified June 7, 2026. View original ↗
Source: TRL Docs
Link last verified: 2026-06-07