Google Gemini ↗

Original Documentation

This guide covers setting up and using Google’s Gemini models with Ragas for evaluation.

Overview#

Ragas supports Google Gemini models with automatic adapter selection. The framework works with both the new google-genai SDK (recommended) and the legacy google-generativeai SDK.

Setup#

Prerequisites#

Google API Key with Gemini API access
Python 3.8+
Ragas installed

Installation#

Install required dependencies:

# Recommended: New Google GenAI SDK
pip install ragas google-genai

# Legacy (deprecated, support ends Aug 2025)
pip install ragas google-generativeai

Configuration#

Option 1: Using New Google GenAI SDK (Recommended)#

The new google-genai SDK is the recommended approach:

import os
from google import genai
from ragas.llms import llm_factory

# Create client with API key
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))

# Create LLM - adapter is auto-detected for google provider
llm = llm_factory(
    "gemini-2.0-flash",
    provider="google",
    client=client
)

Option 2: Using Legacy SDK (Deprecated)#

The old google-generativeai SDK still works but is deprecated (support ends Aug 2025):

import os
import google.generativeai as genai
from ragas.llms import llm_factory

# Configure with your API key
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))

# Create client
client = genai.GenerativeModel("gemini-2.0-flash")

# Create LLM
llm = llm_factory(
    "gemini-2.0-flash",
    provider="google",
    client=client
)

Option 3: Using LiteLLM Proxy (Advanced)#

For advanced use cases where you need LiteLLM’s proxy capabilities, set up the LiteLLM proxy server first, then use:

import os
from openai import OpenAI
from ragas.llms import llm_factory

# Requires running: litellm --model gemini-2.0-flash
client = OpenAI(
    api_key="anything",
    base_url="http://0.0.0.0:4000"  # LiteLLM proxy endpoint
)

# Create LLM with explicit adapter selection
llm = llm_factory("gemini-2.0-flash", client=client, adapter="litellm")

Supported Models#

Ragas works with all Gemini models:

Latest: gemini-2.0-flash (recommended)
1.5 Series: gemini-1.5-pro, gemini-1.5-flash
1.0 Series: gemini-1.0-pro

For the latest models and pricing, see Google AI Studio.

Embeddings Configuration#

Ragas metrics fall into two categories:

LLM-only metrics (don’t require embeddings):
ContextPrecision
ContextRecall
Faithfulness
AspectCritic
Embedding-dependent metrics (require embeddings):
AnswerCorrectness
AnswerRelevancy
AnswerSimilarity
SemanticSimilarity
ContextEntityRecall

Automatic Provider Matching#

When using Ragas with Gemini, the embedding provider is automatically matched to your LLM provider. If you provide a Gemini LLM, Ragas will default to using Google embeddings. No OpenAI API key is needed.

Option 1: Default Embeddings (Recommended)#

Let Ragas automatically select the right embeddings based on your LLM:

import os
from datasets import Dataset
from google import genai
from ragas import evaluate
from ragas.llms import llm_factory
from ragas.metrics import (
    AnswerCorrectness,
    ContextPrecision,
    ContextRecall,
    Faithfulness
)

# Initialize Gemini client (new SDK)
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

# Create sample evaluation data
data = {
    "question": ["What is the capital of France?"],
    "answer": ["Paris is the capital of France."],
    "contexts": [["France is a country in Western Europe. Paris is its capital."]],
    "ground_truth": ["Paris"]
}

dataset = Dataset.from_dict(data)

# Define metrics - embeddings are auto-configured for Google
metrics = [
    ContextPrecision(llm=llm),
    ContextRecall(llm=llm),
    Faithfulness(llm=llm),
    AnswerCorrectness(llm=llm)  # Uses Google embeddings automatically
]

# Run evaluation
results = evaluate(dataset, metrics=metrics)
print(results)

Option 2: Explicit Embeddings#

For explicit control over embeddings, you can create them separately. Google embeddings work with multiple configuration options:

import os
from google import genai
from ragas.llms import llm_factory
from ragas.embeddings import GoogleEmbeddings
from ragas.embeddings.base import embedding_factory
from datasets import Dataset
from ragas import evaluate
from ragas.metrics import AnswerCorrectness, ContextPrecision, ContextRecall, Faithfulness

# Initialize Gemini client (new SDK)
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

# Initialize Google embeddings (multiple options):

# Option A: Using the same client (recommended for new SDK)
embeddings = GoogleEmbeddings(client=client, model="gemini-embedding-001")

# Option B: Using embedding factory
embeddings = embedding_factory("google", model="gemini-embedding-001")

# Option C: Auto-import (creates client automatically)
embeddings = GoogleEmbeddings(model="gemini-embedding-001")

# Create sample evaluation data
data = {
    "question": ["What is the capital of France?"],
    "answer": ["Paris is the capital of France."],
    "contexts": [["France is a country in Western Europe. Paris is its capital."]],
    "ground_truth": ["Paris"]
}

dataset = Dataset.from_dict(data)

# Define metrics with explicit embeddings
metrics = [
    ContextPrecision(llm=llm),
    ContextRecall(llm=llm),
    Faithfulness(llm=llm),
    AnswerCorrectness(llm=llm, embeddings=embeddings)
]

# Run evaluation
results = evaluate(dataset, metrics=metrics)
print(results)

Example: Complete Evaluation#

Here’s a complete example evaluating a RAG application with Gemini (using automatic embedding provider matching):

import os
from datasets import Dataset
from google import genai
from ragas import evaluate
from ragas.llms import llm_factory
from ragas.metrics import (
    AnswerCorrectness,
    ContextPrecision,
    ContextRecall,
    Faithfulness
)

# Initialize Gemini client (new SDK)
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

# Create sample evaluation data
data = {
    "question": ["What is the capital of France?"],
    "answer": ["Paris is the capital of France."],
    "contexts": [["France is a country in Western Europe. Paris is its capital."]],
    "ground_truth": ["Paris"]
}

dataset = Dataset.from_dict(data)

# Define metrics - embeddings automatically use Google provider
metrics = [
    ContextPrecision(llm=llm),
    ContextRecall(llm=llm),
    Faithfulness(llm=llm),
    AnswerCorrectness(llm=llm)
]

# Run evaluation
results = evaluate(dataset, metrics=metrics)
print(results)

Performance Considerations#

Model Selection#

gemini-2.0-flash: Best for speed and efficiency
gemini-1.5-pro: Better reasoning for complex evaluations
gemini-1.5-flash: Good balance of speed and cost

Cost Optimization#

Gemini models are cost-effective. For large-scale evaluations:

Use gemini-2.0-flash for most metrics
Consider batch processing for multiple evaluations
Cache prompts when possible (Gemini supports prompt caching)

Async Support#

For high-throughput evaluations, use async operations:

import os
from google import genai
from ragas.llms import llm_factory

# Create client (new SDK)
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

# Use in async evaluation
# response = await llm.agenerate(prompt, ResponseModel)

Adapter Selection#

Ragas automatically selects the appropriate adapter based on your setup:

# Auto-detection happens automatically
# For Gemini: uses LiteLLM adapter
# For other providers: uses Instructor adapter

# Explicit selection (if needed)
llm = llm_factory(
    "gemini-2.0-flash",
    client=client,
    adapter="litellm"  # Explicit adapter selection
)

# Check auto-detected adapter
from ragas.llms.adapters import auto_detect_adapter
adapter_name = auto_detect_adapter(client, "google")
print(f"Using adapter: {adapter_name}")  # Output: Using adapter: litellm

Troubleshooting#

API Key Issues#

# Make sure your API key is set
import os
if not os.environ.get("GOOGLE_API_KEY"):
    raise ValueError("GOOGLE_API_KEY environment variable not set")

Known Issue: Instructor Safety Settings (New SDK)#

There is a known upstream issue with the instructor library where it sends invalid safety settings to the Gemini API when using the new google-genai SDK. This may cause errors like:

Invalid value at 'safety_settings[5].category'... "HARM_CATEGORY_JAILBREAK"

Workarounds:

Use the OpenAI-compatible endpoint (recommended for now):

from openai import OpenAI
client = OpenAI(
    api_key=os.environ.get("GOOGLE_API_KEY"),
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/"
)
llm = llm_factory("gemini-2.0-flash", provider="openai", client=client)

Track the upstream issue: instructor#1658

Note: Embeddings work correctly with the new SDK - this issue only affects LLM generation.

Rate Limits#

Gemini has rate limits. For production use, the LLM adapter handles retries and timeouts automatically. If you need fine-grained control, ensure your client is properly configured with appropriate timeouts at the HTTP client level.

Model Availability#

If a model isn’t available:

Check your region/quota in Google Cloud Console
Try a different model from the supported list
Verify your API key has access to the Generative AI API

Migration from Other Providers#

From OpenAI#

# Before: OpenAI-only
from openai import OpenAI
client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))
llm = llm_factory("gpt-4o", client=client)

# After: Gemini with new SDK
from google import genai
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

From Anthropic#

# Before: Anthropic
from anthropic import Anthropic
client = Anthropic(api_key=os.environ.get("ANTHROPIC_API_KEY"))
llm = llm_factory("claude-3-sonnet", provider="anthropic", client=client)

# After: Gemini with new SDK
from google import genai
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

From Legacy google-generativeai SDK#

# Before: Legacy SDK (deprecated)
import google.generativeai as genai
genai.configure(api_key=os.environ.get("GOOGLE_API_KEY"))
client = genai.GenerativeModel("gemini-2.0-flash")
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

# After: New SDK (recommended)
from google import genai
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

Using with Metrics Collections (Modern Approach)#

For the modern metrics collections API, you need to explicitly create both LLM and embeddings:

import os
from google import genai
from ragas.llms import llm_factory
from ragas.embeddings import GoogleEmbeddings
from ragas.metrics.collections import AnswerCorrectness, ContextPrecision

# Create client (new SDK)
client = genai.Client(api_key=os.environ.get("GOOGLE_API_KEY"))

# Create LLM
llm = llm_factory("gemini-2.0-flash", provider="google", client=client)

# Create embeddings using the same client
embeddings = GoogleEmbeddings(client=client, model="gemini-embedding-001")

# Create metrics with explicit LLM and embeddings
metrics = [
    ContextPrecision(llm=llm),  # LLM-only metric
    AnswerCorrectness(llm=llm, embeddings=embeddings),  # Needs both
]

# Use metrics with your evaluation workflow
result = await metrics[1].ascore(
    user_input="What is the capital of France?",
    response="Paris",
    reference="Paris is the capital of France."
)

Key difference from legacy approach:

Legacy evaluate(): Auto-creates embeddings from LLM provider
Modern collections: You explicitly pass embeddings to each metric

This gives you more control and works seamlessly with Gemini!

Supported Metrics#

All Ragas metrics work with Gemini:

Answer Correctness
Answer Relevancy
Answer Similarity
Aspect Critique
Context Precision
Context Recall
Context Entities Recall
Faithfulness
NLI Eval
Response Relevancy

See Metrics Reference for details.

Advanced: Custom Model Parameters#

Pass custom parameters to Gemini:

llm = llm_factory(
    "gemini-2.0-flash",
    client=client,
    temperature=0.5,
    max_tokens=2048,
    top_p=0.9,
    top_k=40,
)

Resources#

Link last verified June 7, 2026. View original ↗

Source: RAGAS Docs

Link last verified: 2026-03-04