LangSmith Evaluation ↗
noOriginal Documentation
Documentation Index#
Fetch the complete documentation index at: https://docs.langchain.com/llms.txt Use this file to discover all available pages before exploring further.
LangSmith supports two types of evaluations based on when and where they run:
<span class=“card-start” data-card-raw=“title=“Offline Evaluation” icon=“flask”"> Test before you ship
Run evaluations on curated datasets during development to compare versions, benchmark performance, and catch regressions.
<span class=“card-start” data-card-raw=“title=“Online Evaluation” icon=“radar”"> Monitor in production
Evaluate real user interactions in real-time to detect issues and measure quality on live traffic.
Evaluation workflow#
Create a dataset with
Create
* [Human](/langsmith/evaluation-concepts#human) review
* [Code](/langsmith/evaluation-concepts#code) rules
* [LLM-as-judge](/langsmith/llm-as-judge)
* [Pairwise](/langsmith/evaluate-pairwise) comparison
Execute your application on the dataset to create an
Compare experiments for benchmarking, unit tests, regression tests, or backtesting.
Each interaction creates a
Set up evaluators to run automatically on production traces: safety checks, format validation, quality heuristics, and reference-free LLM-as-judge. Apply filters and sampling rates to control costs.
Evaluators run automatically on runs or
Add failing production traces to your dataset, create targeted evaluators, validate fixes with offline experiments, and redeploy.
For more on the differences between offline and online evaluation, refer to the Evaluation concepts page.
Get started#
Create and manage datasets for evaluation through the UI or SDK.
Explore evaluation types, techniques, and frameworks for comprehensive testing.
View and analyze evaluation results, compare experiments, filter data, and export findings.
Monitor production quality in real-time from the Observability tab.
Learn by following step-by-step tutorials, from simple chatbots to complex agent evaluations.
To set up a LangSmith instance, visit the Platform setup section to choose between cloud, hybrid, or self-hosted. All options include observability, evaluation, prompt engineering, and deployment.
Edit this page on GitHub or file an issue.
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.