Text-to-SQL Evaluation Quickstart ↗
noOriginal Documentation
The text2sql template evaluates text-to-SQL systems by comparing SQL execution results.
Create the Project#
ragas quickstart text2sql
cd text2sqlInstall Dependencies#
uv syncSet Your API Key#
export OPENAI_API_KEY="your-openai-key"Run the Evaluation#
uv run python evals.pyProject Structure#
text2sql/
├── README.md # Project documentation
├── pyproject.toml # Project configuration
├── text2sql_agent.py # Text-to-SQL agent
├── db_utils.py # Database utilities
├── evals.py # Evaluation workflow
├── prompt.txt # Base prompt template
├── prompt_v2.txt # Improved prompt v2
├── prompt_v3.txt # Improved prompt v3
├── __init__.py # Python package marker
└── evals/
├── datasets/
│ └── booksql_sample.csv # Sample book database queries
├── experiments/ # Evaluation results
└── logs/ # Execution logsWhat It Evaluates#
The template evaluates text-to-SQL generation:
- Agent: Converts natural language to SQL queries
- Database: Sample book database with authors, titles, genres
- Test Cases: Natural language questions → expected SQL queries
- Metric: Execution accuracy by comparing query results using datacompy
Understanding the Code#
The Agent (text2sql_agent.py)#
Converts natural language to SQL:
from text2sql_agent import Text2SQLAgent
agent = Text2SQLAgent(client=openai_client)
sql = await agent.generate_sql("Find all books by Jane Austen")The Evaluation (evals.py)#
Compares execution results:
@discrete_metric(name="execution_accuracy", allowed_values=["correct", "incorrect"])
def execution_accuracy(expected_sql: str, predicted_success: bool, predicted_result):
# Executes both SQLs and compares results using datacompy
# Returns "correct" if results match, "incorrect" otherwiseTest Data#
The template includes evals/datasets/booksql_sample.csv with sample questions and expected SQL queries for a book database.
Customization#
Use Your Own Database#
Update db_utils.py to connect to your database:
def get_db_connection():
return sqlite3.connect("your_database.db")Try Different Prompts#
The template includes three prompt versions in prompt.txt, prompt_v2.txt, and prompt_v3.txt. Test each to see which works best.
Next Steps#
- Agent Evaluation - Evaluate AI agents
- Workflow Evaluation - Evaluate complex workflows
Link last verified
June 7, 2026.
View original ↗
Source: RAGAS Docs
Link last verified: 2026-03-04