Quick Start ↗
noOriginal Documentation
Get started with Ragas in minutes. Create a complete evaluation project with just a few commands.
Step 1: Create Your Project#
Choose one of the following methods:
No installation required. uvx automatically downloads and runs ragas:
uvx ragas quickstart rag_eval
cd rag_evalInstall ragas first, then create the project:
pip install ragas
ragas quickstart rag_eval
cd rag_evalStep 2: Install Dependencies#
Install the project dependencies:
uv syncOr if you prefer pip:
pip install -e .Step 3: Set Your API Key#
By default, the quickstart example uses OpenAI. Set your API key and you’re ready to go. You can also use some other provider with a minor change:
export OPENAI_API_KEY="your-openai-key"The quickstart project is already configured to use OpenAI. You’re all set!
Set your Anthropic API key:
export ANTHROPIC_API_KEY="your-anthropic-key"Then update the LLM initialization in evals.py:
from anthropic import Anthropic
from ragas.llms import llm_factory
client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
llm = llm_factory("claude-3-5-sonnet-20241022", provider="anthropic", client=client)Set up your Google credentials:
export GOOGLE_API_KEY="your-google-api-key"Then update the LLM initialization in evals.py:
Option 1: Using Google’s Official Library (Recommended)
import google.generativeai as genai
from ragas.llms import llm_factory
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
client = genai.GenerativeModel("gemini-2.0-flash")
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)
# Adapter is auto-detected as "litellm" for google providerFor more Gemini options and detailed setup, see the Google Gemini Integration Guide.
Install and run Ollama locally, then update the LLM initialization in evals.py:
from openai import OpenAI
from ragas.llms import llm_factory
# Create an OpenAI-compatible client for Ollama
client = OpenAI(
api_key="ollama", # Ollama doesn't require a real key
base_url="http://localhost:11434/v1"
)
llm = llm_factory("mistral", provider="openai", client=client)For any LLM with OpenAI-compatible API:
from openai import OpenAI
from ragas.llms import llm_factory
client = OpenAI(
api_key="your-api-key",
base_url="https://your-api-endpoint"
)
llm = llm_factory("model-name", provider="openai", client=client)For more details, learn about LLM integrations.
Project Structure#
Your generated project includes:
rag_eval/
├── README.md # Project documentation
├── pyproject.toml # Project configuration
├── rag.py # Your RAG application
├── evals.py # Evaluation workflow
├── __init__.py # Makes this a Python package
└── evals/
├── datasets/ # Test data files
├── experiments/ # Evaluation results
└── logs/ # Execution logsStep 4: Run Your Evaluation#
Run the evaluation script:
uv run python evals.pyOr if you installed with pip:
python evals.pyThe evaluation will:
- Load test data from the
load_dataset()function inevals.py - Query your RAG application with test questions
- Evaluate responses
- Display results in the console
- Save results to CSV in the
evals/experiments/directory
Congratulations! You have a complete evaluation setup running. 🎉
Customize Your Evaluation#
Add More Test Cases#
Edit the load_dataset() function in evals.py to add more test questions:
from ragas import Dataset
def load_dataset():
"""Load test dataset for evaluation."""
dataset = Dataset(
name="test_dataset",
backend="local/csv",
root_dir=".",
)
data_samples = [
{
"question": "What is Ragas?",
"grading_notes": "Ragas is an evaluation framework for LLM applications",
},
{
"question": "How do metrics work?",
"grading_notes": "Metrics evaluate the quality and performance of LLM responses",
},
# Add more test cases here
]
for sample in data_samples:
dataset.append(sample)
dataset.save()
return datasetCustomize Evaluation Metrics#
The template includes a DiscreteMetric for custom evaluation logic. You can customize the evaluation by:
- Modify the metric prompt - Change the evaluation criteria
- Adjust allowed values - Update valid output categories
- Add more metrics - Create additional metrics for different aspects
Example of modifying the metric:
from ragas.metrics import DiscreteMetric
from ragas.llms import llm_factory
my_metric = DiscreteMetric(
name="custom_evaluation",
prompt="Evaluate this response: {response} based on: {context}. Return 'excellent', 'good', or 'poor'.",
allowed_values=["excellent", "good", "poor"],
)What’s Next?#
- Learn the concepts: Read the Evaluate a Simple LLM Application guide for deeper understanding
- Custom metrics: Create your own metrics using simple decorators
- Production integration: Integrate evaluations into your CI/CD pipeline
- RAG evaluation: Evaluate RAG systems with specialized metrics
- Agent evaluation: Explore AI agent evaluation
- Test data generation: Generate synthetic test datasets for your evaluations