Ragas CLI ↗
noOriginal Documentation
The Ragas Command Line Interface (CLI) provides tools for quickly setting up evaluation projects and running experiments from the terminal.
Installation#
The CLI is included with the ragas package:
pip install ragasOr use uvx to run without installation:
uvx ragas --helpAvailable Commands#
ragas quickstart#
Create a complete evaluation project from a template. This is the fastest way to get started with Ragas.
ragas quickstart [TEMPLATE] [OPTIONS]Arguments:
TEMPLATE: Template name (optional). Leave empty to see available templates.
Options:
-o, --output-dir: Directory to create the project in (default: current directory)
Examples:
# List available templates
ragas quickstart
# Create a RAG evaluation project
ragas quickstart rag_eval
# Create project in a specific directory
ragas quickstart rag_eval --output-dir ./my-projectragas evals#
Run evaluations on a dataset using an evaluation file.
ragas evals EVAL_FILE [OPTIONS]Arguments:
EVAL_FILE: Path to the evaluation file (required)
Options:
--dataset: Name of the dataset in the project (required)--metrics: Comma-separated list of metric field names to evaluate (required)--baseline: Baseline experiment name to compare against (optional)--name: Name of the experiment run (optional)
Example:
ragas evals evals.py --dataset test_data --metrics accuracy,relevanceragas hello_world#
Create a simple hello world example to verify your installation.
ragas hello_world [DIRECTORY]Arguments:
DIRECTORY: Directory to create the example in (default: current directory)
Quickstart Templates#
RAG & Retrieval#
- RAG Evaluation (
rag_eval) - Evaluate RAG systems with custom metrics - Improve RAG (
improve_rag) - Compare naive vs agentic RAG approaches
Agent Evaluation#
- Agent Evaluation (
agent_evals) - Evaluate AI agents solving math problems - LlamaIndex Agent Evaluation (
llamaIndex_agent_evals) - Evaluate LlamaIndex agents with tool call metrics
Specialized Use Cases#
- Text-to-SQL Evaluation (
text2sql) - Evaluate text-to-SQL systems with execution accuracy - Workflow Evaluation (
workflow_eval) - Evaluate complex LLM workflows - Prompt Evaluation (
prompt_evals) - Compare different prompt variations
LLM Testing#
- Judge Alignment (
judge_alignment) - Measure LLM-as-judge alignment with human standards - LLM Benchmarking (
benchmark_llm) - Benchmark and compare different LLM models
Quick Start#
Get running in 60 seconds:
# Create project
uvx ragas quickstart rag_eval
cd rag_eval
# Install dependencies
uv sync
# Set API key
export OPENAI_API_KEY="your-key"
# Run evaluation
uv run python evals.pyNext Steps#
- RAG Evaluation Guide - Detailed walkthrough of the rag_eval template
- Improve RAG Guide - Compare naive vs agentic RAG approaches
- Custom Metrics - Create your own evaluation metrics
Link last verified
June 7, 2026.
View original ↗
Source: RAGAS Docs
Link last verified: 2026-03-04