How to run an evaluation asynchronously ↗

langchain guide intermediate testing workflows

Original Documentation

Documentation Index#
Fetch the complete documentation index at: https://docs.langchain.com/llms.txt Use this file to discover all available pages before exploring further.

Evaluations | Evaluators | Datasets | Experiments

We can run evaluations asynchronously via the SDK using aevaluate(), which accepts all of the same arguments as evaluate() but expects the application function to be asynchronous. You can learn more about how to use the evaluate() function here.

This guide is only relevant when using the Python SDK. In JS/TS the evaluate() function is already async. You can see how to use it here.

Use `aevaluate()`#

Python

Requires langsmith>=0.3.13

from langsmith import wrappers, Client
from openai import AsyncOpenAI

# Optionally wrap the OpenAI client to trace all model calls.
oai_client = wrappers.wrap_openai(AsyncOpenAI())

# Optionally add the 'traceable' decorator to trace the inputs/outputs of this function.
@traceable
async def researcher_app(inputs: dict) -> str:
    instructions = """You are an excellent researcher. Given a high-level research idea, \
list 5 concrete questions that should be investigated to determine if the idea is worth pursuing."""

    response = await oai_client.chat.completions.create(
        model="gpt-4.1-mini",
        messages=[
            {"role": "system", "content": instructions},
            {"role": "user", "content": inputs["idea"]},
        ],
    )
    return response.choices[0].message.content

# Evaluator functions can be sync or async
def concise(inputs: dict, outputs: dict) -> bool:
    return len(outputs["output"]) < 3 * len(inputs["idea"])

ls_client = Client()
ideas = [
    "universal basic income",
    "nuclear fusion",
    "hyperloop",
    "nuclear powered rockets",
]
dataset = ls_client.create_dataset("research ideas")
ls_client.create_examples(
    dataset_name=dataset.name,
    examples=[{"inputs": {"idea": i}} for i in ideas],
)

# Can equivalently use the 'aevaluate' function directly:
# from langsmith import aevaluate
# await aevaluate(...)
results = await ls_client.aevaluate(
    researcher_app,
    data=dataset,
    evaluators=[concise],
    # Optional, add concurrency.
    max_concurrency=2,  # Optional, add concurrency.
    experiment_prefix="gpt-4.1-mini-baseline"  # Optional, random by default.
)

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Link last verified June 7, 2026. View original ↗

Source: LangChain Docs

Link last verified: 2026-03-04

How to run an evaluation asynchronously ↗

Original Documentation

Documentation Index#

Use aevaluate()#

Related#

Use `aevaluate()`#