<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Testing on AI Knowledge Base</title><link>https://learn-ai.blindshot.kz/topics/testing/</link><description>Recent content in Testing on AI Knowledge Base</description><generator>Hugo</generator><language>en-us</language><atom:link href="https://learn-ai.blindshot.kz/topics/testing/index.xml" rel="self" type="application/rss+xml"/><item><title>Evaluating and Debugging Generative AI Models</title><link>https://learn-ai.blindshot.kz/courses/dlai-eval-debug-genai/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/courses/dlai-eval-debug-genai/</guid><description>&lt;p&gt;Covers evaluation metrics, debugging techniques, and systematic testing for generative AI applications using Weights &amp;amp; Biases. The practical companion to the Evaluation &amp;amp; Testing learning path — the course provides hands-on practice with evaluation tools, while the path covers the full evaluation landscape across providers.&lt;/p&gt;</description></item><item><title>TDD vs BDD vs SDD</title><link>https://learn-ai.blindshot.kz/docs/sdd/methodology/tdd-bdd-sdd-comparison/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/sdd/methodology/tdd-bdd-sdd-comparison/</guid><description>Comparison of test-driven, behavior-driven, and specification-driven development, highlighting when each approach is most appropriate and how they complement each other.</description></item><item><title>Evaluation &amp; Testing</title><link>https://learn-ai.blindshot.kz/paths/evaluation-testing/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/paths/evaluation-testing/</guid><description>&lt;p&gt;Build a comprehensive evaluation practice for AI applications. This path spans 7 sources to cover the full evaluation landscape: foundational concepts, practical implementation, RAG-specific metrics, LLM-as-judge patterns, and agent evaluation challenges.&lt;/p&gt;
&lt;p&gt;Evaluation is the most cross-cutting concern in AI development — every provider and framework has a different take. OpenAI provides hosted evals, RAGAS specializes in RAG metrics, DSPy uses metrics for optimization, LangSmith offers traceability, and W&amp;amp;B Weave treats evaluation as a core development primitive. This path helps you pick the right tools and combine them.&lt;/p&gt;</description></item><item><title>Adding to your CI pipeline with Pytest</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/add_to_ci/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/add_to_ci/_overview/</guid><description/></item><item><title>Agent evals</title><link>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/agent-evals/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/agent-evals/</guid><description>Use agent evals to create datasets, configure graders, and track evaluation runs for your agents.</description></item><item><title>Agent Evaluation Quickstart</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/cli/agent_evals/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/cli/agent_evals/_overview/</guid><description/></item><item><title>AI Evaluations UI</title><link>https://learn-ai.blindshot.kz/docs/together-ai/docs/ai-evaluations-ui/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/together-ai/docs/ai-evaluations-ui/</guid><description>Guide to using the AI Evaluations UI for model assessment</description></item><item><title>Aligning LLM Evaluators with Human Judgment</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/vertexai_alignment/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/vertexai_alignment/_overview/</guid><description/></item><item><title>An Overview of the Developer Playground</title><link>https://learn-ai.blindshot.kz/docs/cohere/docs/playground-overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/cohere/docs/playground-overview/</guid><description>The Cohere Playground is a powerful visual interface for testing Cohere&amp;rsquo;s generation and embedding language models without coding.</description></item><item><title>Application-specific evaluation approaches</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluation-approaches/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluation-approaches/</guid><description/></item><item><title>Automatically run evaluators on experiments</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/bind-evaluator-to-dataset/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/bind-evaluator-to-dataset/</guid><description/></item><item><title>Basic RAG</title><link>https://learn-ai.blindshot.kz/docs/mistral/docs/guides/basic-rag/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/mistral/docs/guides/basic-rag/</guid><description>Learn how to build a basic RAG system by combining retrieval and generation for AI-powered knowledge-based responses</description></item><item><title>Basic RAG: Retrieval-Augmented Generation with Cohere</title><link>https://learn-ai.blindshot.kz/docs/cohere/page/basic-rag/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/cohere/page/basic-rag/</guid><description>This page describes how to work with Cohere&amp;rsquo;s basic retrieval-augmented generation functionality.</description></item><item><title>Braintrust</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/observability/braintrust/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/observability/braintrust/</guid><description>Braintrust integration for CrewAI with OpenTelemetry tracing and evaluation</description></item><item><title>Build an evaluation</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/tutorial-eval/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/tutorial-eval/</guid><description>Learn how to build an evaluation pipeline with Weave Models and Evaluations</description></item><item><title>Building RAG models with Cohere</title><link>https://learn-ai.blindshot.kz/docs/cohere/docs/rag-with-cohere/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/cohere/docs/rag-with-cohere/</guid><description>This page walks through building a retrieval-augmented generation model with Cohere.</description></item><item><title>Built-in Evaluators</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/evaluators/built-in/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/evaluators/built-in/_overview/</guid><description/></item><item><title>Case Lifecycle Hooks</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/lifecycle/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/lifecycle/_overview/</guid><description/></item><item><title>CI/CD with Pinecone Local and GitHub Actions</title><link>https://learn-ai.blindshot.kz/docs/pinecone/guides/production/automated-testing/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pinecone/guides/production/automated-testing/</guid><description>Test Pinecone integration with CI/CD workflows.</description></item><item><title>Code Embeddings</title><link>https://learn-ai.blindshot.kz/docs/mistral/docs/capabilities/embeddings/code_embeddings/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/mistral/docs/capabilities/embeddings/code_embeddings/</guid><description>Code embeddings enable retrieval, clustering, and analytics for code databases and coding assistants using Mistral AI&amp;rsquo;s API</description></item><item><title>Cohere's Command R7B Model</title><link>https://learn-ai.blindshot.kz/docs/cohere/docs/command-r7b/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/cohere/docs/command-r7b/</guid><description>Command R7B is the smallest, fastest, and final model in our R family of enterprise-focused large language models. It excels at RAG, tool use, and agents.</description></item><item><title>Collect and track datasets</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/core-types/datasets/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/core-types/datasets/</guid><description>Organize, collect, track, and version examples for LLM application evaluation</description></item><item><title>Command Line Interface</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/command-line-interface/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/command-line-interface/</guid><description/></item><item><title>Common workflows</title><link>https://learn-ai.blindshot.kz/docs/anthropic/claude-code/common-workflows/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/anthropic/claude-code/common-workflows/</guid><description>Step-by-step guides for exploring codebases, fixing bugs, refactoring, testing, and other everyday tasks with Claude Code.</description></item><item><title>Compare and rank models</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/core-types/leaderboards/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/core-types/leaderboards/</guid><description>Compare and rank different model versions based on evaluation metrics</description></item><item><title>Compare LLMs using Ragas Evaluations</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/compare_llms/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/compare_llms/_overview/</guid><description/></item><item><title>Compare model performance using the Evaluation Playground</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/tools/evaluation_playground/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/tools/evaluation_playground/</guid><description>Compare and evaluate model performance without code using Weave&amp;rsquo;s interactive playground, running evaluations with custom datasets and LLM judges to test system prompts, models, and scoring criteria in a visual interface.</description></item><item><title>Comparison Testing</title><link>https://learn-ai.blindshot.kz/docs/deepseek/guides/comparison_testing/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepseek/guides/comparison_testing/</guid><description/></item><item><title>Concurrency &amp; Performance</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/concurrency/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/concurrency/_overview/</guid><description/></item><item><title>Conversation Simulator</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/conversation-simulator/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/conversation-simulator/</guid><description/></item><item><title>Conversation Simulator Custom Templates</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/conversation-simulator-custom-templates/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/conversation-simulator-custom-templates/</guid><description/></item><item><title>Conversation Simulator Lifecycle Hooks</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/conversation-simulator-lifecycle-hooks/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/conversation-simulator-lifecycle-hooks/</guid><description/></item><item><title>Conversation Simulator Model Callback</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/conversation-simulator-model-callback/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/conversation-simulator-model-callback/</guid><description/></item><item><title>Conversation Simulator Simulation Graph</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/conversation-simulator-simulation-graph/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/conversation-simulator-simulation-graph/</guid><description/></item><item><title>Conversation Simulator Stopping Logic</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/conversation-simulator-stopping-logic/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/conversation-simulator-stopping-logic/</guid><description/></item><item><title>Core Concepts</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/core-concepts/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/core-concepts/_overview/</guid><description/></item><item><title>Create and manage saved views</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/tools/saved-views/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/tools/saved-views/</guid><description>Customize how you interact with traced function calls and evaluations</description></item><item><title>Create dynamic Leaderboards in Evaluations</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/dynamic_leaderboards/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/dynamic_leaderboards/</guid><description>Dynamic Leaderboards let you configure, customize, save, and update Leaderboard views directly from an evaluation.</description></item><item><title>Crew Studio</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/enterprise/features/crew-studio/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/enterprise/features/crew-studio/</guid><description>Build new automations with AI assistance, a visual editor, and integrated testing.</description></item><item><title>Criteria</title><link>https://learn-ai.blindshot.kz/docs/google/adk/evaluate/criteria/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/google/adk/evaluate/criteria/_overview/</guid><description/></item><item><title>CSV RAG Search</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/tools/file-document/csvsearchtool/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/tools/file-document/csvsearchtool/</guid><description>The &amp;lsquo;CSVSearchTool&amp;rsquo; is a powerful RAG (Retrieval-Augmented Generation) tool designed for semantic searches within a CSV file&amp;rsquo;s content.</description></item><item><title>Custom Evaluators</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/evaluators/custom/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/evaluators/custom/_overview/</guid><description/></item><item><title>Custom Metrics</title><link>https://learn-ai.blindshot.kz/docs/google/adk/evaluate/custom_metrics/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/google/adk/evaluate/custom_metrics/_overview/</guid><description/></item><item><title>Custom Multi-hop Query</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/customizations/testgenerator/_testgen-customisation/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/customizations/testgenerator/_testgen-customisation/_overview/</guid><description/></item><item><title>Custom Single-hop Query</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/customizations/testgenerator/_testgen-custom-single-hop/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/customizations/testgenerator/_testgen-custom-single-hop/_overview/</guid><description/></item><item><title>Customizing Test Data Generation</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/customizations/testgenerator/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/customizations/testgenerator/_overview/</guid><description/></item><item><title>Data Handling</title><link>https://learn-ai.blindshot.kz/docs/dspy/learn/evaluation/data/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/dspy/learn/evaluation/data/_overview/</guid><description/></item><item><title>Data modeling</title><link>https://learn-ai.blindshot.kz/docs/pinecone/guides/index-data/data-modeling/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pinecone/guides/index-data/data-modeling/</guid><description>Learn how to structure records for efficient data retrieval and management in Pinecone.</description></item><item><title>Data Privacy</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/data-privacy/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/data-privacy/</guid><description/></item><item><title>Data retrieval with GPT Actions</title><link>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/actions/data-retrieval/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/actions/data-retrieval/</guid><description>Learn about performing data retrieval using APIs, relational databases, and vector databases with GPT Actions.</description></item><item><title>Dataset Management</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/dataset-management/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/dataset-management/_overview/</guid><description/></item><item><title>Dataset Serialization</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/dataset-serialization/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/dataset-serialization/_overview/</guid><description/></item><item><title>Deep Dive Into Evaluating RAG Outputs</title><link>https://learn-ai.blindshot.kz/docs/cohere/page/rag-evaluation-deep-dive/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/cohere/page/rag-evaluation-deep-dive/</guid><description>This page contains information on evaluating the output of RAG systems.</description></item><item><title>DeepEval</title><link>https://learn-ai.blindshot.kz/docs/chroma/integrations/frameworks/deepeval/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/chroma/integrations/frameworks/deepeval/</guid><description/></item><item><title>Define and log attributes</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/tools/attributes/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/tools/attributes/</guid><description>Use attributes to add meta data to your traces and evaluations.</description></item><item><title>Deploying Models in Private Environments</title><link>https://learn-ai.blindshot.kz/docs/cohere/docs/single-container-on-private-clouds/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/cohere/docs/single-container-on-private-clouds/</guid><description>Learn how to pull and test Cohere&amp;rsquo;s container images using a license with Docker and Kubernetes.</description></item><item><title>DeploymentSampler</title><link>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/training-api/reference/deployment-sampler/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/training-api/reference/deployment-sampler/</guid><description>Client-side tokenized sampling from inference deployments for training and evaluation.</description></item><item><title>Develop Tests</title><link>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/develop-tests/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/develop-tests/</guid><description/></item><item><title>Development</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/medical-chatbot/development/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/medical-chatbot/development/</guid><description/></item><item><title>Development</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/rag-qa-agent/development/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/rag-qa-agent/development/</guid><description/></item><item><title>Development</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/summarization-agent/development/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/summarization-agent/development/</guid><description/></item><item><title>Different Types of API Keys and Rate Limits</title><link>https://learn-ai.blindshot.kz/docs/cohere/docs/rate-limits/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/cohere/docs/rate-limits/</guid><description>This page describes Cohere API rate limits for production and evaluation keys.</description></item><item><title>Directory RAG Search</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/tools/file-document/directorysearchtool/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/tools/file-document/directorysearchtool/</guid><description>The &amp;lsquo;DirectorySearchTool&amp;rsquo; is a powerful RAG (Retrieval-Augmented Generation) tool designed for semantic searches within a directory&amp;rsquo;s content.</description></item><item><title>End-to-end example of RAG with Chat, Embed, and Rerank</title><link>https://learn-ai.blindshot.kz/docs/cohere/docs/rag-complete-example/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/cohere/docs/rag-complete-example/</guid><description>Guide on using Cohere&amp;rsquo;s Retrieval Augmented Generation (RAG) capabilities covering the Chat, Embed, and Rerank endpoints (API v2).</description></item><item><title>Environment Variables</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/environment-variables/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/environment-variables/</guid><description/></item><item><title>Eval Tool</title><link>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/eval-tool/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/eval-tool/</guid><description/></item><item><title>Evals</title><link>https://learn-ai.blindshot.kz/docs/langchain/oss/javascript/langchain/test/evals/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/oss/javascript/langchain/test/evals/</guid><description>Evaluate agent trajectories using deterministic matching or LLM-as-judge evaluators with AgentEvals and LangSmith.</description></item><item><title>Evals</title><link>https://learn-ai.blindshot.kz/docs/langchain/oss/python/langchain/test/evals/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/oss/python/langchain/test/evals/</guid><description>Evaluate agent trajectories using deterministic matching or LLM-as-judge evaluators with AgentEvals and LangSmith.</description></item><item><title>Evals In Prod</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/medical-chatbot/evals-in-prod/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/medical-chatbot/evals-in-prod/</guid><description/></item><item><title>Evals In Prod</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/rag-qa-agent/evals-in-prod/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/rag-qa-agent/evals-in-prod/</guid><description/></item><item><title>Evals In Prod</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/summarization-agent/evals-in-prod/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/summarization-agent/evals-in-prod/</guid><description/></item><item><title>Evaluate a chatbot</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-chatbot-tutorial/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-chatbot-tutorial/</guid><description/></item><item><title>Evaluate a complex agent</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-complex-agent/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-complex-agent/</guid><description/></item><item><title>Evaluate a hosted API model</title><link>https://learn-ai.blindshot.kz/docs/wandb/models/launch/evaluate-hosted-model/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/models/launch/evaluate-hosted-model/</guid><description>Evaluate a hosted API model using infrastructure managed by CoreWeave</description></item><item><title>Evaluate a model checkpoint</title><link>https://learn-ai.blindshot.kz/docs/wandb/models/launch/evaluate-model-checkpoint/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/models/launch/evaluate-model-checkpoint/</guid><description>Evaluate a VLLM-compatible model checkpoint using infrastructure managed by CoreWeave</description></item><item><title>Evaluate a New LLM</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/benchmark_llm/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/benchmark_llm/_overview/</guid><description/></item><item><title>Evaluate a prompt</title><link>https://learn-ai.blindshot.kz/docs/ragas/tutorials/prompt/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/tutorials/prompt/_overview/</guid><description/></item><item><title>Evaluate a RAG application</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-rag-tutorial/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-rag-tutorial/</guid><description/></item><item><title>Evaluate a simple LLM application</title><link>https://learn-ai.blindshot.kz/docs/ragas/getstarted/evals/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/getstarted/evals/_overview/</guid><description>&lt;p&gt;This is the canonical first hands-on with RAGAS, and it matters because it shows the metric-driven evaluation loop on a simple LLM app before you add retrieval complexity. Focus on how RAGAS frames a sample, a metric, and a score — the same abstractions scale up to full RAG evaluation. A subtle gotcha is that many RAGAS metrics call an LLM under the hood, so scores carry cost and run-to-run variance you must account for. Read this before the RAG tutorial, which layers retrieval metrics on top.&lt;/p&gt;</description></item><item><title>Evaluate a simple RAG system</title><link>https://learn-ai.blindshot.kz/docs/ragas/getstarted/rag_eval/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/getstarted/rag_eval/_overview/</guid><description/></item><item><title>Evaluate a simple RAG system</title><link>https://learn-ai.blindshot.kz/docs/ragas/tutorials/rag/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/tutorials/rag/_overview/</guid><description>&lt;p&gt;This is the practical RAG evaluation walkthrough and the page most teams should run first when they need to measure a retrieval pipeline rather than guess at it. Pay attention to the distinction between retrieval metrics like context precision and recall and generation metrics like faithfulness and answer relevancy, because a RAG system can fail at either stage and the fix differs entirely. A common mistake is optimizing answer quality while ignoring context recall, leaving the model fluent but ungrounded. Start with the simple-evals page first if you are new to RAGAS.&lt;/p&gt;</description></item><item><title>Evaluate a Text-to-SQL Agent</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/text2sql/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/text2sql/_overview/</guid><description/></item><item><title>Evaluate an AI Agent</title><link>https://learn-ai.blindshot.kz/docs/ragas/tutorials/agent/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/tutorials/agent/_overview/</guid><description/></item><item><title>Evaluate an AI Workflow</title><link>https://learn-ai.blindshot.kz/docs/ragas/tutorials/workflow/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/tutorials/workflow/_overview/</guid><description/></item><item><title>Evaluate and Improve a RAG App</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/evaluate-and-improve-rag/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/evaluate-and-improve-rag/_overview/</guid><description/></item><item><title>Evaluate answers</title><link>https://learn-ai.blindshot.kz/docs/pinecone/guides/assistant/evaluate-answers/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pinecone/guides/assistant/evaluate-answers/</guid><description>Measure assistant response quality with LLM-based evaluation.</description></item><item><title>Evaluate external models</title><link>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/external-models/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/external-models/</guid><description>Learn how to run evals on non-OpenAI models, using the OpenAI platform.</description></item><item><title>Evaluate RAG applications</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/tutorial-rag/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/tutorial-rag/</guid><description>Build and evaluate RAG applications using Weave with LLM judges</description></item><item><title>Evaluate using local scorers</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/weave_local_scorers/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/weave_local_scorers/</guid><description>Small language models that run locally to evaluate AI system safety and quality</description></item><item><title>Evaluating Multi-turn Conversations</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/evaluating_multi_turn_conversations/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/evaluating_multi_turn_conversations/_overview/</guid><description/></item><item><title>Evaluating Text Summarization Models</title><link>https://learn-ai.blindshot.kz/docs/cohere/page/summarization-evals/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/cohere/page/summarization-evals/</guid><description>This page discusses how to evaluate a model&amp;rsquo;s text summarization.</description></item><item><title>Evaluation</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/medical-chatbot/evaluation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/medical-chatbot/evaluation/</guid><description/></item><item><title>Evaluation</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/rag-qa-agent/evaluation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/rag-qa-agent/evaluation/</guid><description/></item><item><title>Evaluation</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/summarization-agent/evaluation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/summarization-agent/evaluation/</guid><description/></item><item><title>Evaluation</title><link>https://learn-ai.blindshot.kz/docs/mistral/docs/guides/evaluation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/mistral/docs/guides/evaluation/</guid><description>Guide to evaluating LLMs for specific tasks with metrics, human, and LLM-based methods</description></item><item><title>Evaluation Arena Test Cases</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-arena-test-cases/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-arena-test-cases/</guid><description/></item><item><title>Evaluation benchmark catalog</title><link>https://learn-ai.blindshot.kz/docs/wandb/models/launch/evaluations/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/models/launch/evaluations/</guid><description>Browse the evaluation benchmarks available through LLM Evaluation Jobs</description></item><item><title>Evaluation best practices</title><link>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/evaluation-best-practices/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/evaluation-best-practices/</guid><description>Advanced evaluation patterns for production AI systems — handling ambiguous cases, scaling eval suites, avoiding eval gaming, and integrating evals into CI/CD pipelines.</description></item><item><title>Evaluation Component Level Llm Evals</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-component-level-llm-evals/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-component-level-llm-evals/</guid><description/></item><item><title>Evaluation concepts</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluation-concepts/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluation-concepts/</guid><description/></item><item><title>Evaluation Dataset</title><link>https://learn-ai.blindshot.kz/docs/ragas/concepts/components/eval_dataset/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/concepts/components/eval_dataset/_overview/</guid><description/></item><item><title>Evaluation Datasets</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-datasets/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-datasets/</guid><description/></item><item><title>Evaluation End To End Llm Evals</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-end-to-end-llm-evals/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-end-to-end-llm-evals/</guid><description/></item><item><title>Evaluation End To End Multi Turn</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-end-to-end-multi-turn/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-end-to-end-multi-turn/</guid><description/></item><item><title>Evaluation End To End Single Turn</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-end-to-end-single-turn/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-end-to-end-single-turn/</guid><description/></item><item><title>Evaluation Flags And Configs</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-flags-and-configs/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-flags-and-configs/</guid><description/></item><item><title>Evaluation Introduction</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-introduction/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-introduction/</guid><description/></item><item><title>Evaluation Llm Tracing</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-llm-tracing/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-llm-tracing/</guid><description/></item><item><title>Evaluation Mcp</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-mcp/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-mcp/</guid><description/></item><item><title>Evaluation Multiturn Test Cases</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-multiturn-test-cases/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-multiturn-test-cases/</guid><description/></item><item><title>Evaluation overview</title><link>https://learn-ai.blindshot.kz/docs/pinecone/guides/assistant/evaluation-overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pinecone/guides/assistant/evaluation-overview/</guid><description>Learn about evaluating the correctness and completeness of assistant responses.</description></item><item><title>Evaluation Overview</title><link>https://learn-ai.blindshot.kz/docs/dspy/learn/evaluation/overview/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/dspy/learn/evaluation/overview/_overview/</guid><description>&lt;p&gt;This is the foundational concept page for DSPy&amp;rsquo;s evaluate-then-optimize workflow, and it is essential reading before you touch any teleprompter. The key insight is that DSPy treats evaluation as a first-class input to compilation rather than an afterthought — your dev set and metric become the signal the optimizer uses to rewrite prompts. Start here, then move to the metrics page to define what good actually means for your task. Watch out for evaluating on the same examples you optimize against, which inflates scores and hides overfitting.&lt;/p&gt;</description></item><item><title>Evaluation Prompts</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-prompts/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-prompts/</guid><description/></item><item><title>Evaluation quickstart</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluation-quickstart/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluation-quickstart/</guid><description>&lt;p&gt;This is the fastest path into LangSmith evaluation and the right starting point before the deeper evaluator guides. The key takeaway is the dataset to target-function to evaluator to run loop, which is the mental model every other LangSmith eval feature builds on. Pay attention to how examples and the evaluation client are wired up, since that boilerplate carries over to LLM-as-judge work. A common beginner mistake is evaluating against a dataset that does not represent production traffic, which produces reassuring but meaningless scores.&lt;/p&gt;</description></item><item><title>Evaluation Sample</title><link>https://learn-ai.blindshot.kz/docs/ragas/concepts/components/eval_sample/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/concepts/components/eval_sample/_overview/</guid><description/></item><item><title>Evaluation Test Cases</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-test-cases/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-test-cases/</guid><description/></item><item><title>Evaluation types</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluation-types/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluation-types/</guid><description/></item><item><title>Evaluation Unit Testing In Ci Cd</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-unit-testing-in-ci-cd/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/evaluation-unit-testing-in-ci-cd/</guid><description/></item><item><title>Evaluations overview</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/core-types/evaluations/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/core-types/evaluations/</guid><description>Evaluation-driven LLM application development to systematically improve applications</description></item><item><title>Evaluations with Vertex AI models</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/vertexai_x_ragas/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/vertexai_x_ragas/_overview/</guid><description/></item><item><title>Evaluators</title><link>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/evaluators/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/evaluators/</guid><description>Understand the fundamentals of evaluators and reward functions in reinforcement fine-tuning</description></item><item><title>Export evaluation data</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/export_eval/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/export_eval/</guid><description>Programmatically export evaluation results using the Evaluation REST API.</description></item><item><title>Faq</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/faq/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/faq/</guid><description/></item><item><title>Fireworks Agent: Classification</title><link>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/agent/classification/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/agent/classification/</guid><description>Benchmark base models, fine-tune on labeled data, and pick the best classifier — automatically.</description></item><item><title>Fireworks Agent: Evaluator Authoring</title><link>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/agent/evaluators/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/agent/evaluators/</guid><description>Have Fireworks Agent generate a reusable evaluator from your dataset — for scoring candidates in an SFT sweep, or for use with Managed RFT.</description></item><item><title>Fireworks Agent: Preference Learning (DPO/ORPO)</title><link>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/agent/dpo/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/agent/dpo/</guid><description>Run preference fine-tuning end-to-end with optional base-model sweep, automatic pair generation, and pairwise evaluation.</description></item><item><title>Fireworks Agent: Supervised Fine-Tuning</title><link>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/agent/sft/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/agent/sft/</guid><description>Run end-to-end SFT with Fireworks Agent — dataset inspection, hyperparameter sweep, evaluator-guided selection, and a deployed winner.</description></item><item><title>Galileo</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/observability/galileo/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/observability/galileo/</guid><description>Galileo integration for CrewAI tracing and evaluation</description></item><item><title>Generate Parallel Queries for Better RAG Retrieval</title><link>https://learn-ai.blindshot.kz/docs/cohere/docs/generating-parallel-queries/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/cohere/docs/generating-parallel-queries/</guid><description>Build an agentic RAG system that can expand a user query into a more optimized set of queries for retrieval.</description></item><item><title>Get latest invocations by keys</title><link>https://learn-ai.blindshot.kz/docs/chroma/reference/sync-api/invocation/get-latest-invocations-by-keys/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/chroma/reference/sync-api/invocation/get-latest-invocations-by-keys/</guid><description>Returns the latest invocations for the given keys on a source.</description></item><item><title>Getting Started</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/getting-started/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/getting-started/</guid><description>&lt;p&gt;This five-minute quickstart is the fastest way into DeepEval: install it, write a test case, pick a metric, and run deepeval test run, which feels like pytest for LLM outputs. The critical thing to set up first is an OPENAI_API_KEY, because nearly all DeepEval metrics are LLM-as-a-judge evaluators that call a model under the hood. If a run appears stuck, suspect rate limits or quota rather than a framework bug, the most common early gotcha. DeepEval covers similar ground to RAGAS but with a pytest-style assertion workflow; read the metrics introduction next.&lt;/p&gt;</description></item><item><title>Getting Started Agents</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/getting-started-agents/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/getting-started-agents/</guid><description/></item><item><title>Getting Started Chatbots</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/getting-started-chatbots/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/getting-started-chatbots/</guid><description/></item><item><title>Getting Started Llm Arena</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/getting-started-llm-arena/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/getting-started-llm-arena/</guid><description/></item><item><title>Getting Started Mcp</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/getting-started-mcp/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/getting-started-mcp/</guid><description/></item><item><title>Getting Started Rag</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/getting-started-rag/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/getting-started-rag/</guid><description/></item><item><title>Getting started with datasets</title><link>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/evaluation-getting-started/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/evaluation-getting-started/</guid><description>Introduction to evaluation datasets — the foundation for systematic AI testing and the first step in eval-driven development.</description></item><item><title>Getting started with GPT Actions</title><link>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/actions/getting-started/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/actions/getting-started/</guid><description>Learn how to set up and test GPT actions from scratch with the OpenAI API.</description></item><item><title>Golden Synthesizer</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/golden-synthesizer/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/golden-synthesizer/</guid><description/></item><item><title>Graders</title><link>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/graders/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/graders/</guid><description>Learn about graders used for evals and fine-tuning.</description></item><item><title>Guides Ai Agent Evaluation</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-ai-agent-evaluation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-ai-agent-evaluation/</guid><description/></item><item><title>Guides Ai Agent Evaluation Metrics</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-ai-agent-evaluation-metrics/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-ai-agent-evaluation-metrics/</guid><description/></item><item><title>Guides Answer Correctness Metric</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-answer-correctness-metric/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-answer-correctness-metric/</guid><description/></item><item><title>Guides Building Custom Metrics</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-building-custom-metrics/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-building-custom-metrics/</guid><description/></item><item><title>Guides Llm As A Judge</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-llm-as-a-judge/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-llm-as-a-judge/</guid><description/></item><item><title>Guides Llm Observability</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-llm-observability/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-llm-observability/</guid><description/></item><item><title>Guides Multi Turn Evaluation</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-multi-turn-evaluation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-multi-turn-evaluation/</guid><description/></item><item><title>Guides Multi Turn Evaluation Metrics</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-multi-turn-evaluation-metrics/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-multi-turn-evaluation-metrics/</guid><description/></item><item><title>Guides Multi Turn Simulation</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-multi-turn-simulation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-multi-turn-simulation/</guid><description/></item><item><title>Guides Optimizing Hyperparameters</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-optimizing-hyperparameters/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-optimizing-hyperparameters/</guid><description/></item><item><title>Guides Rag Evaluation</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-rag-evaluation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-rag-evaluation/</guid><description/></item><item><title>Guides Rag Triad</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-rag-triad/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-rag-triad/</guid><description/></item><item><title>Guides Red Teaming</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-red-teaming/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-red-teaming/</guid><description/></item><item><title>Guides Regression Testing In Cicd</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-regression-testing-in-cicd/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-regression-testing-in-cicd/</guid><description/></item><item><title>Guides Tracing Ai Agents</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-tracing-ai-agents/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-tracing-ai-agents/</guid><description/></item><item><title>Guides Tracing Multi Turn</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-tracing-multi-turn/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-tracing-multi-turn/</guid><description/></item><item><title>Guides Tracing Rag</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-tracing-rag/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-tracing-rag/</guid><description/></item><item><title>Guides Using Custom Embedding Models</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-using-custom-embedding-models/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-using-custom-embedding-models/</guid><description/></item><item><title>Guides Using Custom Llms</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-using-custom-llms/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-using-custom-llms/</guid><description/></item><item><title>Guides Using Synthesizer</title><link>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-using-synthesizer/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/guides/guides-using-synthesizer/</guid><description/></item><item><title>Handle Streaming Refusals</title><link>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/strengthen-guardrails/handle-streaming-refusals/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/strengthen-guardrails/handle-streaming-refusals/</guid><description>&lt;p&gt;Streaming refusals present a unique UX challenge: tokens have already been sent to the client before the model decides to refuse, so you cannot simply suppress the response. This guide covers detection strategies and graceful recovery patterns for when Claude mid-stream determines a request violates safety guidelines. Pay close attention to the stop reason codes and how they differ from normal completion events — your streaming parser needs to handle refusal signals without crashing or displaying partial unsafe content. Implement these patterns early in development rather than retrofitting them after users encounter jarring truncated responses in production.&lt;/p&gt;</description></item><item><title>Haystack and Cohere (Integration Guide)</title><link>https://learn-ai.blindshot.kz/docs/cohere/docs/haystack-and-cohere/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/cohere/docs/haystack-and-cohere/</guid><description>Build custom LLM applications with Haystack, now integrated with Cohere for embedding, generation, chat, and retrieval.</description></item><item><title>Hierarchical Process</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/learn/hierarchical-process/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/learn/hierarchical-process/</guid><description>A comprehensive guide to understanding and applying the hierarchical process within your CrewAI projects, updated to reflect the latest coding practices and functionalities.</description></item><item><title>How to add evaluators to an existing experiment (Python only)</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-existing-experiment/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-existing-experiment/</guid><description/></item><item><title>How to audit evaluator scores</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/audit-evaluator-scores/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/audit-evaluator-scores/</guid><description/></item><item><title>How to create a composite evaluator</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/composite-evaluators-sdk/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/composite-evaluators-sdk/</guid><description/></item><item><title>How to create a composite evaluator</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/composite-evaluators-ui/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/composite-evaluators-ui/</guid><description/></item><item><title>How to define a code evaluator</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/code-evaluator-sdk/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/code-evaluator-sdk/</guid><description/></item><item><title>How to define a code evaluator</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/code-evaluator-ui/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/code-evaluator-ui/</guid><description/></item><item><title>How to define a summary evaluator</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/summary/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/summary/</guid><description/></item><item><title>How to define a target function to evaluate</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/define-target-function/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/define-target-function/</guid><description/></item><item><title>How to define an LLM-as-a-judge evaluator</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/llm-as-judge-sdk/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/llm-as-judge-sdk/</guid><description/></item><item><title>How to define an LLM-as-a-judge evaluator</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/llm-as-judge/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/llm-as-judge/</guid><description>&lt;p&gt;LLM-as-judge is the workhorse evaluator for open-ended outputs where exact-match scoring is impossible, so this page becomes essential the moment you move past trivial test cases. Pay close attention to how you define the judge prompt and scoring schema — vague rubrics produce noisy, irreproducible scores, the most common pitfall here. This is conceptually the same technique RAGAS and OpenAI&amp;rsquo;s agent evals implement, but LangSmith binds the judge directly to traced runs. Read the evaluation quickstart first to understand datasets and runs.&lt;/p&gt;</description></item><item><title>How to evaluate a graph</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-graph/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-graph/</guid><description/></item><item><title>How to evaluate a runnable</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/langchain-runnable/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/langchain-runnable/</guid><description/></item><item><title>How to evaluate an application's intermediate steps</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-on-intermediate-steps/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-on-intermediate-steps/</guid><description/></item><item><title>How to evaluate an LLM application</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-llm-application/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-llm-application/</guid><description/></item><item><title>How to evaluate with OpenTelemetry</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-with-opentelemetry/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-with-opentelemetry/</guid><description/></item><item><title>How to evaluate with repetitions</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/repetition/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/repetition/</guid><description/></item><item><title>How to evaluate your agent with trajectory evaluations</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/trajectory-evals/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/trajectory-evals/</guid><description/></item><item><title>How to improve your evaluator with few-shot examples</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/create-few-shot-evaluators/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/create-few-shot-evaluators/</guid><description/></item><item><title>How to retry failed runs in experiments (Python only)</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-with-retry/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-with-retry/</guid><description/></item><item><title>How to return multiple scores in one evaluator</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/multiple-scores/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/multiple-scores/</guid><description/></item><item><title>How to run a pairwise evaluation</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-pairwise/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-pairwise/</guid><description/></item><item><title>How to run an evaluation asynchronously</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluation-async/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluation-async/</guid><description/></item><item><title>How to run an evaluation locally (Python only)</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/local/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/local/</guid><description/></item><item><title>How to run evaluations with pytest</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/pytest/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/pytest/</guid><description/></item><item><title>How to run evaluations with Vitest/Jest</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/vitest-jest/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/vitest-jest/</guid><description/></item><item><title>How to use prebuilt evaluators</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/prebuilt-evaluators/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/prebuilt-evaluators/</guid><description/></item><item><title>How to use the REST API</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/run-evals-api-only/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/run-evals-api-only/</guid><description/></item><item><title>HuggingFace Dataset Evaluations</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/cookbooks/hf_dataset_evals/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/cookbooks/hf_dataset_evals/</guid><description>Learn how to use huggingface dataset evaluations with W&amp;amp;B Weave</description></item><item><title>Implement a CI/CD pipeline using LangSmith Deployment and Evaluation</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/cicd-pipeline-example/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/cicd-pipeline-example/</guid><description/></item><item><title>Improve LLM-as-judge evaluators using human feedback</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/improve-judge-evaluator-feedback/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/improve-judge-evaluator-feedback/</guid><description/></item><item><title>Improvement</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/medical-chatbot/improvement/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/medical-chatbot/improvement/</guid><description/></item><item><title>Improvement</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/rag-qa-agent/improvement/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/rag-qa-agent/improvement/</guid><description/></item><item><title>Improvement</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/summarization-agent/improvement/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/summarization-agent/improvement/</guid><description/></item><item><title>Increase Consistency</title><link>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/strengthen-guardrails/increase-consistency/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/strengthen-guardrails/increase-consistency/</guid><description>&lt;p&gt;Output consistency matters most when Claude powers automated pipelines where downstream code parses its responses. This guide covers techniques like temperature reduction, few-shot examples, structured output formats, and explicit schemas that make Claude&amp;rsquo;s responses more deterministic. The single biggest lever is providing concrete output examples in your prompt &amp;ndash; this anchors the model&amp;rsquo;s formatting far more reliably than verbal instructions alone. Read this before building any system that pipes Claude output into JSON parsers, database inserts, or multi-step agent workflows.&lt;/p&gt;</description></item><item><title>Instructor</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/integrations/instructor/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/integrations/instructor/</guid><description>Trace and evaluate structured data extraction from LLMs with Weave&amp;rsquo;s Instructor integration, capturing Pydantic model validation, retry logic, and JSON schema enforcement for reliable structured output workflows.</description></item><item><title>Integration testing</title><link>https://learn-ai.blindshot.kz/docs/langchain/oss/javascript/langchain/test/integration-testing/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/oss/javascript/langchain/test/integration-testing/</guid><description>Test agents with real LLM APIs by organizing tests, managing keys, handling flakiness, and controlling costs.</description></item><item><title>Integration testing</title><link>https://learn-ai.blindshot.kz/docs/langchain/oss/python/langchain/test/integration-testing/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/oss/python/langchain/test/integration-testing/</guid><description>Test agents with real LLM APIs by organizing tests, managing keys, handling flakiness, and controlling costs.</description></item><item><title>Intro to Retrieval</title><link>https://learn-ai.blindshot.kz/docs/chroma/guides/build/intro-to-retrieval/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/chroma/guides/build/intro-to-retrieval/</guid><description>Ground LLMs in your own data using retrieval-augmented generation.</description></item><item><title>Introduction</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/introduction/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/introduction/</guid><description/></item><item><title>Introduction</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/medical-chatbot/introduction/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/medical-chatbot/introduction/</guid><description/></item><item><title>Introduction</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/rag-qa-agent/introduction/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/rag-qa-agent/introduction/</guid><description/></item><item><title>Introduction</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/summarization-agent/introduction/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/summarization-agent/introduction/</guid><description/></item><item><title>Introduction Comparisons</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/introduction-comparisons/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/introduction-comparisons/</guid><description/></item><item><title>Introduction Design Philosophy</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/introduction-design-philosophy/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/introduction-design-philosophy/</guid><description/></item><item><title>Introduction to Evaluations</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/cookbooks/intro_to_weave_hello_eval/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/cookbooks/intro_to_weave_hello_eval/</guid><description>Learn how to use introduction to evaluations with W&amp;amp;B Weave</description></item><item><title>La Plateforme</title><link>https://learn-ai.blindshot.kz/docs/mistral/docs/deployment/laplateforme/overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/mistral/docs/deployment/laplateforme/overview/</guid><description>Mistral AI&amp;rsquo;s La Plateforme offers pay-as-you-go API access to its latest models with flexible deployment options</description></item><item><title>LangGraph</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/integrations/_langgraph_agent_evaluation/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/integrations/_langgraph_agent_evaluation/_overview/</guid><description/></item><item><title>LangSmith CLI</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/langsmith-cli/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/langsmith-cli/</guid><description>Query and manage LangSmith projects, traces, runs, datasets, evaluators, experiments, and threads from the terminal</description></item><item><title>LangSmith Evaluation</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluation/</guid><description/></item><item><title>LangSmith Polly</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/polly-evaluation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/polly-evaluation/</guid><description/></item><item><title>LangSmith skills</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/skills/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/skills/</guid><description>Use Agent Skills to work with LangSmith traces, datasets, and evaluators from your coding agent.</description></item><item><title>Learning DSPy</title><link>https://learn-ai.blindshot.kz/docs/dspy/learn/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/dspy/learn/_overview/</guid><description>Three stages of building AI systems - programming, evaluation, and optimization</description></item><item><title>Let Claude use your computer from the CLI</title><link>https://learn-ai.blindshot.kz/docs/anthropic/claude-code/computer-use/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/anthropic/claude-code/computer-use/</guid><description>Enable computer use in the Claude Code CLI so Claude can open apps, click, type, and see your screen on macOS. Test native apps, debug visual issues, and automate GUI-only tools without leaving your terminal.</description></item><item><title>LlamaIndex</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/integrations/llamaindex/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/integrations/llamaindex/</guid><description>Automatically trace and debug LlamaIndex applications with Weave, capturing all LLM calls, RAG pipelines, agent steps, and evaluations for comprehensive observability of your data-connected AI workflows.</description></item><item><title>LlamaIndex Agent Evaluation Quickstart</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/cli/llamaindex_agent_evals/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/cli/llamaindex_agent_evals/_overview/</guid><description/></item><item><title>LLM Benchmarking Quickstart</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/cli/benchmark_llm/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/cli/benchmark_llm/_overview/</guid><description/></item><item><title>LLM Evaluations</title><link>https://learn-ai.blindshot.kz/docs/together-ai/docs/ai-evaluations/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/together-ai/docs/ai-evaluations/</guid><description>Learn how to run LLM-as-a-Judge evaluations</description></item><item><title>LLM Judge</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/evaluators/llm-judge/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/evaluators/llm-judge/_overview/</guid><description/></item><item><title>Local development &amp; testing</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/local-dev-testing/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/local-dev-testing/</guid><description/></item><item><title>Log evaluation data from your code</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/evaluation_logger/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/evaluation_logger/</guid><description>Flexible, incremental way to log evaluation data from Python and TypeScript code</description></item><item><title>Logfire Integration</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/logfire-integration/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/logfire-integration/_overview/</guid><description/></item><item><title>Manage Weave Projects</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/platform/weave-projects/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/platform/weave-projects/</guid><description>Use Weave projects to organize related assets like traces, prompts, evaluations, models, and dashboards.</description></item><item><title>Maxim Integration</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/observability/maxim/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/observability/maxim/</guid><description>Start Agent monitoring, evaluation, and observability</description></item><item><title>Metrics</title><link>https://learn-ai.blindshot.kz/docs/dspy/learn/evaluation/metrics/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/dspy/learn/evaluation/metrics/_overview/</guid><description>&lt;p&gt;In DSPy a metric is the objective function that drives both evaluation and optimization, so this page matters more than a typical reference — your metric definition directly shapes how teleprompters compile and improve a program. Pay close attention to the difference between simple answer-matching metrics and metrics that themselves call an LM to judge quality, since the latter adds cost and variance you have to control. A common pitfall is returning a bare boolean where an optimizer expects a float score. Read the evaluation overview first, then pair this with the optimizers documentation.&lt;/p&gt;</description></item><item><title>Metrics &amp; Attributes</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/metrics-attributes/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/metrics-attributes/_overview/</guid><description/></item><item><title>Metrics Answer Relevancy</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-answer-relevancy/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-answer-relevancy/</guid><description/></item><item><title>Metrics Arena G Eval</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-arena-g-eval/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-arena-g-eval/</guid><description/></item><item><title>Metrics Argument Correctness</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-argument-correctness/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-argument-correctness/</guid><description/></item><item><title>Metrics Bias</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-bias/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-bias/</guid><description/></item><item><title>Metrics Contextual Precision</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-contextual-precision/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-contextual-precision/</guid><description/></item><item><title>Metrics Contextual Recall</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-contextual-recall/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-contextual-recall/</guid><description/></item><item><title>Metrics Contextual Relevancy</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-contextual-relevancy/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-contextual-relevancy/</guid><description/></item><item><title>Metrics Conversation Completeness</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-conversation-completeness/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-conversation-completeness/</guid><description/></item><item><title>Metrics Conversational Dag</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-conversational-dag/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-conversational-dag/</guid><description/></item><item><title>Metrics Conversational G Eval</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-conversational-g-eval/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-conversational-g-eval/</guid><description/></item><item><title>Metrics Custom</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-custom/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-custom/</guid><description/></item><item><title>Metrics Dag</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-dag/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-dag/</guid><description/></item><item><title>Metrics Exact Match</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-exact-match/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-exact-match/</guid><description/></item><item><title>Metrics Faithfulness</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-faithfulness/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-faithfulness/</guid><description/></item><item><title>Metrics Goal Accuracy</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-goal-accuracy/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-goal-accuracy/</guid><description/></item><item><title>Metrics Hallucination</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-hallucination/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-hallucination/</guid><description/></item><item><title>Metrics Introduction</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-introduction/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-introduction/</guid><description>&lt;p&gt;This page introduces DeepEval&amp;rsquo;s fifty-plus metrics, each scored from 0 to 1 with reasoning, and it matters because choosing the right metrics is the whole game in LLM evaluation. The key discipline the docs push is restraint: use no more than about five metrics, roughly two or three generic plus one or two custom to your use case, so you prioritize what truly matters instead of drowning in numbers. Because the metrics are LLM-as-a-judge, expect real cost and some run-to-run variance. This parallels RAGAS&amp;rsquo;s metric suite; read getting-started first if you have not run an evaluation yet.&lt;/p&gt;</description></item><item><title>Metrics Json Correctness</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-json-correctness/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-json-correctness/</guid><description/></item><item><title>Metrics Knowledge Retention</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-knowledge-retention/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-knowledge-retention/</guid><description/></item><item><title>Metrics Llm Evals</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-llm-evals/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-llm-evals/</guid><description/></item><item><title>Metrics Mcp Task Completion</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-mcp-task-completion/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-mcp-task-completion/</guid><description/></item><item><title>Metrics Mcp Use</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-mcp-use/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-mcp-use/</guid><description/></item><item><title>Metrics Misuse</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-misuse/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-misuse/</guid><description/></item><item><title>Metrics Multi Turn Mcp Use</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-multi-turn-mcp-use/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-multi-turn-mcp-use/</guid><description/></item><item><title>Metrics Non Advice</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-non-advice/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-non-advice/</guid><description/></item><item><title>Metrics Pattern Match</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-pattern-match/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-pattern-match/</guid><description/></item><item><title>Metrics Pii Leakage</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-pii-leakage/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-pii-leakage/</guid><description/></item><item><title>Metrics Plan Adherence</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-plan-adherence/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-plan-adherence/</guid><description/></item><item><title>Metrics Plan Quality</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-plan-quality/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-plan-quality/</guid><description/></item><item><title>Metrics Prompt Alignment</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-prompt-alignment/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-prompt-alignment/</guid><description/></item><item><title>Metrics Ragas</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-ragas/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-ragas/</guid><description/></item><item><title>Metrics Role Adherence</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-role-adherence/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-role-adherence/</guid><description/></item><item><title>Metrics Role Violation</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-role-violation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-role-violation/</guid><description/></item><item><title>Metrics Step Efficiency</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-step-efficiency/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-step-efficiency/</guid><description/></item><item><title>Metrics Summarization</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-summarization/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-summarization/</guid><description/></item><item><title>Metrics Task Completion</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-task-completion/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-task-completion/</guid><description/></item><item><title>Metrics Tool Correctness</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-tool-correctness/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-tool-correctness/</guid><description/></item><item><title>Metrics Tool Use</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-tool-use/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-tool-use/</guid><description/></item><item><title>Metrics Topic Adherence</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-topic-adherence/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-topic-adherence/</guid><description/></item><item><title>Metrics Toxicity</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-toxicity/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-toxicity/</guid><description/></item><item><title>Metrics Turn Contextual Precision</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-turn-contextual-precision/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-turn-contextual-precision/</guid><description/></item><item><title>Metrics Turn Contextual Recall</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-turn-contextual-recall/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-turn-contextual-recall/</guid><description/></item><item><title>Metrics Turn Contextual Relevancy</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-turn-contextual-relevancy/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-turn-contextual-relevancy/</guid><description/></item><item><title>Metrics Turn Faithfulness</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-turn-faithfulness/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-turn-faithfulness/</guid><description/></item><item><title>Metrics Turn Relevancy</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-turn-relevancy/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/metrics-turn-relevancy/</guid><description/></item><item><title>Miscellaneous</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/miscellaneous/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/miscellaneous/</guid><description/></item><item><title>Mitigate Jailbreaks</title><link>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/strengthen-guardrails/mitigate-jailbreaks/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/strengthen-guardrails/mitigate-jailbreaks/</guid><description>&lt;p&gt;Jailbreak mitigation is essential for any production deployment where Claude interacts with untrusted user input. This guide covers defense-in-depth strategies including system prompt hardening, input validation, and output filtering. A common pitfall is relying solely on system prompt instructions for safety &amp;ndash; attackers routinely bypass single-layer defenses, so layering multiple techniques is critical. Read this alongside the harmlessness screens documentation to understand how Anthropic&amp;rsquo;s built-in protections complement your application-level guardrails.&lt;/p&gt;</description></item><item><title>Model optimization</title><link>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/model-optimization/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/model-optimization/</guid><description>Ensure quality model outputs with evals and fine-tuning in the OpenAI platform.</description></item><item><title>Models Benchmarks</title><link>https://learn-ai.blindshot.kz/docs/mistral/docs/getting-started/models/benchmark/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/mistral/docs/getting-started/models/benchmark/</guid><description>Mistral&amp;rsquo;s benchmarked models excel in reasoning, multilingual tasks, coding, and multimodal capabilities, outperforming competitors in key benchmarks</description></item><item><title>Multi-Run Evaluation</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/multi-run/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/multi-run/_overview/</guid><description/></item><item><title>Multimodal Metrics Image Coherence</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/multimodal-metrics-image-coherence/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/multimodal-metrics-image-coherence/</guid><description/></item><item><title>Multimodal Metrics Image Editing</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/multimodal-metrics-image-editing/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/multimodal-metrics-image-editing/</guid><description/></item><item><title>Multimodal Metrics Image Helpfulness</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/multimodal-metrics-image-helpfulness/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/multimodal-metrics-image-helpfulness/</guid><description/></item><item><title>Multimodal Metrics Image Reference</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/multimodal-metrics-image-reference/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/multimodal-metrics-image-reference/</guid><description/></item><item><title>Multimodal Metrics Text To Image</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/multimodal-metrics-text-to-image/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/multimodal-metrics-text-to-image/</guid><description/></item><item><title>Non-English Testset Generation</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/customizations/testgenerator/_language_adaptation/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/customizations/testgenerator/_language_adaptation/_overview/</guid><description/></item><item><title>Observability</title><link>https://learn-ai.blindshot.kz/docs/mistral/docs/guides/observability/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/mistral/docs/guides/observability/</guid><description>Observability for LLMs ensures visibility, debugging, and performance optimization across prototyping, testing, and production</description></item><item><title>Online Evaluation</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/online-evaluation/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/online-evaluation/_overview/</guid><description/></item><item><title>OpenAI</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/integrations/openai/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/integrations/openai/</guid><description>Integrate OpenAI with Weave for tracing, evaluation, and monitoring</description></item><item><title>Opik Integration</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/observability/opik/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/observability/opik/</guid><description>Learn how to use Comet Opik to debug, evaluate, and monitor your CrewAI applications with comprehensive tracing, automated evaluations, and production-ready dashboards.</description></item><item><title>Optimizing LLM Accuracy</title><link>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/optimizing-llm-accuracy/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/optimizing-llm-accuracy/</guid><description>Learn strategies to enhance the accuracy of large language models using techniques like prompt engineering, retrieval-augmented generation, and fine-tuning.</description></item><item><title>Overview</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/observability/overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/observability/overview/</guid><description>Monitor, evaluate, and optimize your CrewAI agents with comprehensive observability tools</description></item><item><title>Overview</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/_overview/</guid><description/></item><item><title>Overview</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/evaluators/overview/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/evaluators/overview/_overview/</guid><description/></item><item><title>Patronus AI Evaluation</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/observability/patronus-evaluation/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/observability/patronus-evaluation/</guid><description>Monitor and evaluate CrewAI agent performance using Patronus AI&amp;rsquo;s comprehensive evaluation platform for LLM outputs and agent behaviors.</description></item><item><title>Performance</title><link>https://learn-ai.blindshot.kz/docs/chroma/guides/deploy/performance/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/chroma/guides/deploy/performance/</guid><description>Single-node Chroma performance benchmarks and limitations.</description></item><item><title>Performance benchmarking</title><link>https://learn-ai.blindshot.kz/docs/fireworks-ai/deployments/benchmarking/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/fireworks-ai/deployments/benchmarking/</guid><description>Measure and optimize your deployment&amp;rsquo;s performance with load testing</description></item><item><title>Persona Generation</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/customizations/testgenerator/_persona_generator/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/customizations/testgenerator/_persona_generator/_overview/</guid><description/></item><item><title>Pin and compare runs</title><link>https://learn-ai.blindshot.kz/docs/wandb/models/runs/compare-runs/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/models/runs/compare-runs/</guid><description>Learn how to use pinned and baseline runs to keep track of important runs and efficiently evaluate model experiments.</description></item><item><title>Prompt Evaluation Quickstart</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/cli/prompt_evals/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/cli/prompt_evals/_overview/</guid><description/></item><item><title>Prompt Optimization Copro</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/prompt-optimization-copro/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/prompt-optimization-copro/</guid><description/></item><item><title>Prompt Optimization Gepa</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/prompt-optimization-gepa/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/prompt-optimization-gepa/</guid><description/></item><item><title>Prompt Optimization Introduction</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/prompt-optimization-introduction/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/prompt-optimization-introduction/</guid><description/></item><item><title>Prompt Optimization Miprov2</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/prompt-optimization-miprov2/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/prompt-optimization-miprov2/</guid><description/></item><item><title>Prompt Optimization Simba</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/prompt-optimization-simba/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/prompt-optimization-simba/</guid><description/></item><item><title>Prompting capabilities</title><link>https://learn-ai.blindshot.kz/docs/mistral/docs/guides/prompting-capabilities/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/mistral/docs/guides/prompting-capabilities/</guid><description>Learn effective prompting techniques for classification, summarization, personalization, and evaluation with Mistral models</description></item><item><title>Prune Threads</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/agent-server-api/threads/prune-threads/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/agent-server-api/threads/prune-threads/</guid><description>Prune threads by ID. The &amp;lsquo;delete&amp;rsquo; strategy removes threads entirely. The &amp;lsquo;keep_latest&amp;rsquo; strategy prunes old checkpoints but keeps threads and their latest state.</description></item><item><title>Quick Start</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/quick-start/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/quick-start/_overview/</guid><description/></item><item><title>Quickstart: Retrieval Augmented Generation (RAG)</title><link>https://learn-ai.blindshot.kz/docs/together-ai/docs/quickstart-retrieval-augmented-generation-rag/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/together-ai/docs/quickstart-retrieval-augmented-generation-rag/</guid><description>How to build a RAG workflow in under 5 mins!</description></item><item><title>RAG Evaluation</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/cli/rag_eval/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/cli/rag_eval/_overview/</guid><description/></item><item><title>RAG Tool</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/tools/ai-ml/ragtool/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/tools/ai-ml/ragtool/</guid><description>The &amp;lsquo;RagTool&amp;rsquo; is a dynamic knowledge base tool for answering questions using Retrieval-Augmented Generation.</description></item><item><title>Red teaming</title><link>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/red-teaming/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/red-teaming/</guid><description>Learn how red teaming fits into AI evaluation, including Promptfoo open source and OpenAI Red Teaming for enterprise teams.</description></item><item><title>Reduce Hallucinations</title><link>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/strengthen-guardrails/reduce-hallucinations/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/strengthen-guardrails/reduce-hallucinations/</guid><description>&lt;p&gt;Hallucination reduction is arguably the most impactful guardrail topic for practitioners building retrieval-augmented or factual applications with Claude. The guide covers grounding techniques such as providing source documents, instructing the model to quote directly, and asking it to flag uncertainty. A key gotcha is that simply telling Claude &amp;ldquo;don&amp;rsquo;t hallucinate&amp;rdquo; is far less effective than structuring prompts so the model can cite or decline &amp;ndash; give it an explicit escape hatch like &amp;ldquo;say I don&amp;rsquo;t know if the answer isn&amp;rsquo;t in the provided context.&amp;rdquo; Pair this with the evaluation techniques in the testing docs to measure hallucination rates systematically.&lt;/p&gt;</description></item><item><title>Reduce Latency</title><link>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/strengthen-guardrails/reduce-latency/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/strengthen-guardrails/reduce-latency/</guid><description>&lt;p&gt;Latency optimization directly impacts user experience and cost in production Claude deployments. This guide walks through techniques like prompt length reduction, streaming, model selection trade-offs, and caching strategies that can cut response times significantly. Start with the quick wins &amp;ndash; enabling streaming and trimming unnecessary context from prompts &amp;ndash; before moving to architectural changes like prompt caching. Be aware that some latency reduction techniques (such as using smaller models or shorter prompts) trade off against output quality, so always measure both metrics together.&lt;/p&gt;</description></item><item><title>Reduce Prompt Leak</title><link>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/strengthen-guardrails/reduce-prompt-leak/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/anthropic/platform/test-and-evaluate/strengthen-guardrails/reduce-prompt-leak/</guid><description>&lt;p&gt;Prompt leakage is one of the most common security concerns in production LLM applications, and this guide provides concrete techniques for preventing Claude from revealing system prompts to end users. Focus on the layered defense approach — no single technique is sufficient, so you need to combine prompt structure, output filtering, and behavioral instructions. A frequent mistake is relying solely on &amp;ldquo;do not reveal your instructions&amp;rdquo; directives, which are trivially bypassed by indirect extraction attacks. Read this alongside the general guardrails documentation to build a comprehensive safety posture before shipping user-facing agents.&lt;/p&gt;</description></item><item><title>Remote Environment Setup</title><link>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/connect-environments/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/connect-environments/</guid><description>Implement the /init endpoint to run evaluations in your infrastructure</description></item><item><title>Replay Tasks from Latest Crew Kickoff</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/learn/replay-tasks-from-latest-crew-kickoff/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/learn/replay-tasks-from-latest-crew-kickoff/</guid><description>Replay tasks from the latest crew.kickoff(&amp;hellip;)</description></item><item><title>Report Evaluators</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/evaluators/report-evaluators/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/evaluators/report-evaluators/_overview/</guid><description/></item><item><title>Retrieval</title><link>https://learn-ai.blindshot.kz/docs/langchain/oss/javascript/langchain/retrieval/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/oss/javascript/langchain/retrieval/</guid><description/></item><item><title>Retrieval</title><link>https://learn-ai.blindshot.kz/docs/langchain/oss/python/langchain/retrieval/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/oss/python/langchain/retrieval/</guid><description>&lt;p&gt;LangChain&amp;rsquo;s retrieval guide covers the foundational abstractions for document loading, splitting, embedding, and querying that underpin every RAG application built on the framework. Understanding the Retriever interface is critical because it is the common contract that vector stores, BM25 indexes, and custom retrieval strategies all implement. Focus on how retrievers compose with chains and agents, since the retrieval step is often the performance bottleneck in production RAG pipelines. Read this before the RAG-specific guide to ensure you understand the building blocks before seeing them assembled into a full application.&lt;/p&gt;</description></item><item><title>Retrieval</title><link>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/retrieval/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/retrieval/</guid><description>Learn how to search your data using semantic similarity with the OpenAI API.</description></item><item><title>Retrieval Augmented Generation (RAG)</title><link>https://learn-ai.blindshot.kz/docs/cohere/docs/retrieval-augmented-generation-rag/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/cohere/docs/retrieval-augmented-generation-rag/</guid><description>Guide on using Cohere&amp;rsquo;s Retrieval Augmented Generation (RAG) capabilities such as document grounding and citations.</description></item><item><title>Retrieval augmented generation (RAG) - Cohere on Azure AI Foundry</title><link>https://learn-ai.blindshot.kz/docs/cohere/docs/cohere-on-azure/azure-ai-rag/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/cohere/docs/cohere-on-azure/azure-ai-rag/</guid><description>A guide for performing retrieval augmented generation (RAG) with Cohere&amp;rsquo;s Command models on Azure AI Foundry (API v2).</description></item><item><title>Retrieval augmented generation (RAG) - quickstart</title><link>https://learn-ai.blindshot.kz/docs/cohere/docs/rag-quickstart/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/cohere/docs/rag-quickstart/</guid><description>A quickstart guide for performing retrieval augmented generation (RAG) with Cohere&amp;rsquo;s Command models (v2 API).</description></item><item><title>Retrieval evaluation using LLM-as-a-judge via Pydantic AI</title><link>https://learn-ai.blindshot.kz/docs/cohere/page/retrieval-eval-pydantic-ai/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/cohere/page/retrieval-eval-pydantic-ai/</guid><description>This page contains a tutorial on how to evaluate retrieval systems using LLMs as judges via Pydantic AI.</description></item><item><title>Retrieval-Augmented Generation (RAG)</title><link>https://learn-ai.blindshot.kz/docs/dspy/tutorials/rag/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/dspy/tutorials/rag/_overview/</guid><description/></item><item><title>Retry Strategies</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/retry-strategies/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/how-to/retry-strategies/_overview/</guid><description/></item><item><title>Review items in an annotation queue</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/tracking/annotation-review/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/tracking/annotation-review/</guid><description>Evaluate trace items and submit structured feedback using a simplified review interface.</description></item><item><title>Routers</title><link>https://learn-ai.blindshot.kz/docs/fireworks-ai/deployments/routers/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/fireworks-ai/deployments/routers/</guid><description>Distribute traffic across multiple deployments for A/B testing, traffic migration, and load distribution.</description></item><item><title>Rubric-Based Evaluation</title><link>https://learn-ai.blindshot.kz/docs/ragas/concepts/metrics/available_metrics/rubrics_based/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/concepts/metrics/available_metrics/rubrics_based/_overview/</guid><description/></item><item><title>Run an evaluation from the Playground</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/run-evaluation-from-playground/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/run-evaluation-from-playground/</guid><description/></item><item><title>Run an evaluation from the prompt playground</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/run-evaluation-from-prompt-playground/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/run-evaluation-from-prompt-playground/</guid><description/></item><item><title>Run an evaluation with multimodal content</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-with-attachments/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/evaluate-with-attachments/</guid><description/></item><item><title>Run backtests on a new version of an agent</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/run-backtests-new-agent/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/run-backtests-new-agent/</guid><description/></item><item><title>Safety best practices</title><link>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/safety-best-practices/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/safety-best-practices/</guid><description>Comprehensive safety practices for responsible AI deployment — covering moderation, adversarial testing, human oversight, prompt engineering for safety, and production monitoring.</description></item><item><title>Scoring Overview</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/scorers/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/scorers/</guid><description>Evaluate AI outputs and return evaluation metrics with Weave Scorers</description></item><item><title>Set Latest Assistant Version</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/agent-server-api/assistants/set-latest-assistant-version/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/agent-server-api/assistants/set-latest-assistant-version/</guid><description>Set the latest version for an assistant.</description></item><item><title>Set up automations</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/automations/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/automations/</guid><description>Create event-driven automations that trigger actions based on monitor metrics and trace activity.</description></item><item><title>Set up composite online evaluators</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/online-evaluations-composite/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/online-evaluations-composite/</guid><description/></item><item><title>Set up guardrails</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/guardrails/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/guardrails/</guid><description>Ensure LLM safety and measure output quality in production applications</description></item><item><title>Set up LLM-as-a-judge online evaluators</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/online-evaluations-llm-as-judge/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/online-evaluations-llm-as-judge/</guid><description/></item><item><title>Set up monitors</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/monitors/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/monitors/</guid><description>Passively score production traffic to surface trends and issues</description></item><item><title>Set up multi-turn online evaluators</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/online-evaluations-multi-turn/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/online-evaluations-multi-turn/</guid><description/></item><item><title>Set up online code evaluators</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/online-evaluations-code/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/online-evaluations-code/</guid><description/></item><item><title>Simple Validation</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/examples/simple-validation/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/examples/simple-validation/_overview/</guid><description/></item><item><title>Single-hop Query Testset</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/singlehop_testset_gen/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/applications/singlehop_testset_gen/_overview/</guid><description/></item><item><title>Single-Node Performance</title><link>https://learn-ai.blindshot.kz/docs/chroma/guides/performance/single-node/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/chroma/guides/performance/single-node/</guid><description>Single-node Chroma performance benchmarks and limitations.</description></item><item><title>Span-Based</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/evaluators/span-based/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/evals/evaluators/span-based/_overview/</guid><description/></item><item><title>Supported Models</title><link>https://learn-ai.blindshot.kz/docs/together-ai/docs/evaluations-supported-models/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/together-ai/docs/evaluations-supported-models/</guid><description>Supported models for Evaluations</description></item><item><title>Swarm</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/integrations/swarm_agent_evaluation/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/integrations/swarm_agent_evaluation/_overview/</guid><description/></item><item><title>Synthesizer Generate From Contexts</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/synthesizer-generate-from-contexts/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/synthesizer-generate-from-contexts/</guid><description/></item><item><title>Synthesizer Generate From Docs</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/synthesizer-generate-from-docs/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/synthesizer-generate-from-docs/</guid><description/></item><item><title>Synthesizer Generate From Goldens</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/synthesizer-generate-from-goldens/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/synthesizer-generate-from-goldens/</guid><description/></item><item><title>Synthesizer Generate From Scratch</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/synthesizer-generate-from-scratch/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/synthesizer-generate-from-scratch/</guid><description/></item><item><title>Synthetic Data Generation Introduction</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/synthetic-data-generation-introduction/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/synthetic-data-generation-introduction/</guid><description/></item><item><title>Test</title><link>https://learn-ai.blindshot.kz/docs/langchain/oss/javascript/langchain/test/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/oss/javascript/langchain/test/_overview/</guid><description>Strategies for testing LangChain agents, including unit tests, integration tests, and trajectory evaluations.</description></item><item><title>Test</title><link>https://learn-ai.blindshot.kz/docs/langchain/oss/javascript/langgraph/test/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/oss/javascript/langgraph/test/</guid><description/></item><item><title>Test</title><link>https://learn-ai.blindshot.kz/docs/langchain/oss/python/langchain/test/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/oss/python/langchain/test/_overview/</guid><description>Strategies for testing LangChain agents, including unit tests, integration tests, and trajectory evaluations.</description></item><item><title>Test</title><link>https://learn-ai.blindshot.kz/docs/langchain/oss/python/langgraph/test/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/oss/python/langgraph/test/</guid><description/></item><item><title>Test a ReAct agent with Pytest/Vitest and LangSmith</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/test-react-agent-pytest/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/test-react-agent-pytest/</guid><description/></item><item><title>Test Agent Card</title><link>https://learn-ai.blindshot.kz/docs/a2a/a2a-protocol-validator/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/a2a/a2a-protocol-validator/</guid><description>Online tool to validate if a domain supports the A2A protocol and visualize agent card information. Enter any URL to check for A2A protocol support and parse the agent.json file.</description></item><item><title>Test deployed agents</title><link>https://learn-ai.blindshot.kz/docs/google/adk/deploy/agent-engine/test/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/google/adk/deploy/agent-engine/test/_overview/</guid><description/></item><item><title>Test multi-turn conversations</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/multiple-messages/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/multiple-messages/</guid><description/></item><item><title>Test Pinecone at scale</title><link>https://learn-ai.blindshot.kz/docs/pinecone/guides/get-started/test-at-scale/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pinecone/guides/get-started/test-at-scale/</guid><description>Test Pinecone with a real-world dataset and semantic search workload.</description></item><item><title>Testing</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/concepts/testing/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/concepts/testing/</guid><description>Learn how to test your CrewAI Crew and evaluate their performance.</description></item><item><title>Testing</title><link>https://learn-ai.blindshot.kz/docs/pydantic-ai/testing/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pydantic-ai/testing/_overview/</guid><description/></item><item><title>Testset Generation</title><link>https://learn-ai.blindshot.kz/docs/ragas/concepts/test_data_generation/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/concepts/test_data_generation/_overview/</guid><description/></item><item><title>Testset Generation for Agents or Tool use cases</title><link>https://learn-ai.blindshot.kz/docs/ragas/concepts/test_data_generation/agents/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/concepts/test_data_generation/agents/_overview/</guid><description/></item><item><title>Testset Generation for RAG</title><link>https://learn-ai.blindshot.kz/docs/ragas/concepts/test_data_generation/rag/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/concepts/test_data_generation/rag/_overview/</guid><description/></item><item><title>Testset Generation for RAG</title><link>https://learn-ai.blindshot.kz/docs/ragas/getstarted/rag_testset_generation/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/getstarted/rag_testset_generation/_overview/</guid><description/></item><item><title>Text Embeddings</title><link>https://learn-ai.blindshot.kz/docs/mistral/docs/capabilities/embeddings/text_embeddings/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/mistral/docs/capabilities/embeddings/text_embeddings/</guid><description>Generate and use text embeddings with Mistral AI&amp;rsquo;s API for NLP tasks like similarity, classification, and retrieval</description></item><item><title>Text-to-SQL Evaluation Quickstart</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/cli/text2sql/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/cli/text2sql/_overview/</guid><description/></item><item><title>Together AI</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/integrations/together_ai/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/integrations/together_ai/</guid><description>Track and evaluate Together AI&amp;rsquo;s open source LLMs using Weave&amp;rsquo;s OpenAI SDK compatibility for seamless integration with model calls, fine-tuning workflows, and hosted models.</description></item><item><title>Trace and Evaluate a Computer Vision Pipeline with Weave</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/cookbooks/ocr-pipeline/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/cookbooks/ocr-pipeline/</guid><description>Learn how to use trace and evaluate a computer vision pipeline with weave with W&amp;amp;B Weave</description></item><item><title>Trace grading</title><link>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/trace-grading/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/trace-grading/</guid><description>Use trace grading to create datasets, configure graders, and track evaluation runs for your models.</description></item><item><title>Tracing and logging evaluations with Observability tools</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/customizations/metrics/tracing/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/customizations/metrics/tracing/_overview/</guid><description/></item><item><title>Training Overview</title><link>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/cli-reference/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/cli-reference/</guid><description>Launch RFT jobs using the eval-protocol CLI</description></item><item><title>Troubleshooting</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/troubleshooting/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/troubleshooting/</guid><description/></item><item><title>TruLens</title><link>https://learn-ai.blindshot.kz/docs/pinecone/integrations/trulens/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/pinecone/integrations/trulens/</guid><description>Using TruLens and Pinecone to evaluate grounded LLM applications</description></item><item><title>Tutorial Introduction</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/tutorial-introduction/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/tutorial-introduction/</guid><description/></item><item><title>Tutorial Setup</title><link>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/tutorial-setup/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/tutorials/tutorial-setup/</guid><description/></item><item><title>TXT RAG Search</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/tools/file-document/txtsearchtool/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/tools/file-document/txtsearchtool/</guid><description>The &amp;lsquo;TXTSearchTool&amp;rsquo; is designed to perform a RAG (Retrieval-Augmented Generation) search within the content of a text file.</description></item><item><title>Unit testing</title><link>https://learn-ai.blindshot.kz/docs/langchain/oss/javascript/langchain/test/unit-testing/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/oss/javascript/langchain/test/unit-testing/</guid><description>Test agent logic without API calls using fake chat models and in-memory persistence.</description></item><item><title>Unit testing</title><link>https://learn-ai.blindshot.kz/docs/langchain/oss/python/langchain/test/unit-testing/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/oss/python/langchain/test/unit-testing/</guid><description>Test agent logic without API calls using fake chat models and in-memory persistence.</description></item><item><title>Use builtin scorers</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/builtin_scorers/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/evaluation/builtin_scorers/</guid><description>Use Weave&amp;rsquo;s predefined scorers for evaluating your AI applications</description></item><item><title>Use Claude Code with Chrome (beta)</title><link>https://learn-ai.blindshot.kz/docs/anthropic/claude-code/chrome/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/anthropic/claude-code/chrome/</guid><description>Connect Claude Code to your Chrome browser to test web apps, debug with console logs, automate form filling, and extract data from web pages.</description></item><item><title>Use server-side caching</title><link>https://learn-ai.blindshot.kz/docs/langchain/langsmith/caching/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/langsmith/caching/</guid><description>Cache values server-side in your agent deployment using stale-while-revalidate and key-value cache APIs.</description></item><item><title>User Simulation</title><link>https://learn-ai.blindshot.kz/docs/google/adk/evaluate/user-sim/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/google/adk/evaluate/user-sim/_overview/</guid><description/></item><item><title>Using GPT-5.2</title><link>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/latest-model/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/latest-model/</guid><description>Learn about how to use and migrate to GPT-5.2 and the GPT-5 model family, the latest models in the OpenAI API.</description></item><item><title>Using Pre-chunked Data</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/customizations/testgenerator/prechunked_data/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/customizations/testgenerator/prechunked_data/_overview/</guid><description/></item><item><title>Using Secrets</title><link>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/using-secret-in-evaluator/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/fireworks-ai/fine-tuning/using-secret-in-evaluator/</guid><description>Learn how to create secrets that can be utilized within your reward function.</description></item><item><title>Using standard tests</title><link>https://learn-ai.blindshot.kz/docs/langchain/oss/javascript/contributing/standard-tests-langchain/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/oss/javascript/contributing/standard-tests-langchain/</guid><description/></item><item><title>Using standard tests</title><link>https://learn-ai.blindshot.kz/docs/langchain/oss/python/contributing/standard-tests-langchain/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/langchain/oss/python/contributing/standard-tests-langchain/</guid><description/></item><item><title>Verdict</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/integrations/verdict/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/integrations/verdict/</guid><description>Use Verdict evaluation framework with Weave to trace and monitor your LLM evaluation pipelines</description></item><item><title>Verifiers</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/integrations/verifiers/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/guides/integrations/verifiers/</guid><description>Track and debug Verifiers RL environments and LLM agent training with Weave, capturing multi-round conversations, evaluation rollouts, and model performance metrics for comprehensive observability of reinforcement learning workflows.</description></item><item><title>Vibe Coder Quickstart</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/vibe-coder-quickstart/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/vibe-coder-quickstart/</guid><description/></item><item><title>Vibe Coding</title><link>https://learn-ai.blindshot.kz/docs/deepeval/docs/vibe-coding/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/deepeval/docs/vibe-coding/</guid><description/></item><item><title>Weave Integration</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/observability/weave/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/observability/weave/</guid><description>Learn how to use Weights &amp;amp; Biases (W&amp;amp;B) Weave to track, experiment with, evaluate, and improve your CrewAI applications.</description></item><item><title>Website RAG Search</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/tools/search-research/websitesearchtool/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/tools/search-research/websitesearchtool/</guid><description>The &amp;lsquo;WebsiteSearchTool&amp;rsquo; is designed to perform a RAG (Retrieval-Augmented Generation) search within the content of a website.</description></item><item><title>What is Weave?</title><link>https://learn-ai.blindshot.kz/docs/wandb/weave/concepts/what-is-weave/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/wandb/weave/concepts/what-is-weave/</guid><description>Learn about W&amp;amp;B Weave and how it helps you build, evaluate, and improve LLM applications</description></item><item><title>What's New</title><link>https://learn-ai.blindshot.kz/docs/ag-ui/development/updates/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ag-ui/development/updates/</guid><description>The latest updates and improvements to AG-UI</description></item><item><title>Why Evaluate Agents</title><link>https://learn-ai.blindshot.kz/docs/google/adk/evaluate/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/google/adk/evaluate/_overview/</guid><description/></item><item><title>Workflow Evaluation Quickstart</title><link>https://learn-ai.blindshot.kz/docs/ragas/howtos/cli/workflow_eval/_overview/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/ragas/howtos/cli/workflow_eval/_overview/</guid><description/></item><item><title>Working with evals</title><link>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/evals/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/openai/api/api/docs/guides/evals/</guid><description>Build, run, and iterate on evaluations to systematically test and improve AI model outputs — OpenAI&amp;rsquo;s practical guide to eval-driven development.</description></item><item><title>XML RAG Search</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/tools/file-document/xmlsearchtool/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/tools/file-document/xmlsearchtool/</guid><description>The &amp;lsquo;XMLSearchTool&amp;rsquo; is designed to perform a RAG (Retrieval-Augmented Generation) search within the content of a XML file.</description></item><item><title>YouTube Channel RAG Search</title><link>https://learn-ai.blindshot.kz/docs/crewai/en/tools/search-research/youtubechannelsearchtool/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://learn-ai.blindshot.kz/docs/crewai/en/tools/search-research/youtubechannelsearchtool/</guid><description>The &amp;lsquo;YoutubeChannelSearchTool&amp;rsquo; is designed to perform a RAG (Retrieval-Augmented Generation) search within the content of a Youtube channel.</description></item></channel></rss>