How to run an evaluation asynchronously ↗
noOriginal Documentation
Documentation Index#
Fetch the complete documentation index at: https://docs.langchain.com/llms.txt Use this file to discover all available pages before exploring further.
Evaluations | Evaluators | Datasets | Experiments
We can run evaluations asynchronously via the SDK using aevaluate(), which accepts all of the same arguments as evaluate() but expects the application function to be asynchronous. You can learn more about how to use the evaluate() function here.
This guide is only relevant when using the Python SDK. In JS/TS the evaluate() function is already async. You can see how to use it here.
Use aevaluate()#
- Python
Requires langsmith>=0.3.13
from langsmith import wrappers, Client
from openai import AsyncOpenAI
# Optionally wrap the OpenAI client to trace all model calls.
oai_client = wrappers.wrap_openai(AsyncOpenAI())
# Optionally add the 'traceable' decorator to trace the inputs/outputs of this function.
@traceable
async def researcher_app(inputs: dict) -> str:
instructions = """You are an excellent researcher. Given a high-level research idea, \
list 5 concrete questions that should be investigated to determine if the idea is worth pursuing."""
response = await oai_client.chat.completions.create(
model="gpt-4.1-mini",
messages=[
{"role": "system", "content": instructions},
{"role": "user", "content": inputs["idea"]},
],
)
return response.choices[0].message.content
# Evaluator functions can be sync or async
def concise(inputs: dict, outputs: dict) -> bool:
return len(outputs["output"]) < 3 * len(inputs["idea"])
ls_client = Client()
ideas = [
"universal basic income",
"nuclear fusion",
"hyperloop",
"nuclear powered rockets",
]
dataset = ls_client.create_dataset("research ideas")
ls_client.create_examples(
dataset_name=dataset.name,
examples=[{"inputs": {"idea": i}} for i in ideas],
)
# Can equivalently use the 'aevaluate' function directly:
# from langsmith import aevaluate
# await aevaluate(...)
results = await ls_client.aevaluate(
researcher_app,
data=dataset,
evaluators=[concise],
# Optional, add concurrency.
max_concurrency=2, # Optional, add concurrency.
experiment_prefix="gpt-4.1-mini-baseline" # Optional, random by default.
)Related#
Edit this page on GitHub or file an issue.
Connect these docs to Claude, VSCode, and more via MCP for real-time answers.
Link last verified
June 7, 2026.
View original ↗
Source: LangChain Docs
Link last verified: 2026-03-04