LangSmith Deployment ↗

langchain guide advanced agents streaming deployment workflows

Summary: Deploy and manage agents with durable execution, real-time streaming, and horizontal scaling.

Original Documentation

Documentation Index#
Fetch the complete documentation index at: https://docs.langchain.com/llms.txt Use this file to discover all available pages before exploring further.

Deploy and manage agents with durable execution, real-time streaming, and horizontal scaling.

LangSmith Deployment is a workflow orchestration runtime purpose-built for agent workloads. It handles the infrastructure that agents need to run reliably: long-running stateful execution, human-in-the-loop pauses, real-time streaming, horizontal scaling — all with a first-class Studio development environment.

Start here if you’re building or operating agent applications. This section is about deploying your application. If you need to set up LangSmith infrastructure, the Platform setup section covers infrastructure options (cloud, hybrid, self-hosted) and setup guides.

A typical deployment workflow:

<Step title={Test locally}> Run your application on a local development server.

<Step title={Configure app for deployment}> Set up dependencies, project structure, and environment configuration.

<Step title={Choose hosting}> (Required for deployment) Select Cloud, Hybrid, or Self-hosted.

Cloud: Push code from a git repository
Hybrid or Self-hosted with control plane: Build and push Docker images, deploy via UI
Standalone servers: Deploy directly without control plane
<Step title={Monitor & manage}> Track traces, alerts, and dashboards.

Durable execution#

At its core, LangSmith Deployment is a durable execution engine. Your agents run on a managed task queue with automatic checkpointing, so any run can be retried, replayed, or resumed from the exact point of interruption — not from scratch.

Because execution is durable, agents can do things that would be fragile or impossible in a stateless runtime:

Wait for external input. An agent calls interrupt() and the runtime checkpoints its state, frees resources, and waits — for a human to approve a transaction, for a reviewer to edit a draft, for another system to return results. When Command(resume=...) arrives hours or days later, execution picks up exactly where it stopped. This is the primitive underneath human-in-the-loop workflows and time-travel debugging.
Run in the background. Background runs execute without blocking the caller. The runtime manages the full lifecycle — queuing, execution, checkpointing, completion — while the client moves on.
Run on a schedule. Cron jobs trigger agent execution on a recurring cadence. A daily summary agent, a weekly report, a periodic data sync — the runtime starts a new execution on schedule with the same durability guarantees.
Handle concurrent input. When a user sends new input while an agent is mid-run (double-texting), the runtime can queue it, cancel the in-progress run, or process both in parallel — without data races or corrupted state.
Retry on failure. Configurable retry policies control backoff, max attempts, and which exceptions trigger retries on a per-node basis. Runs survive process restarts, infrastructure failures, and code revisions mid-execution.

For details on how containers, processes, and the task queue work together, see Agent Server: Runtime architecture. For scaling and throughput tuning, see Configure Agent Server for scale.

Streaming#

Agents need to show their work in real time. The runtime provides resumable streaming — if a client disconnects mid-stream (network switch, tab sleep, mobile backgrounding), it reconnects and picks up where it left off. Multiple streaming modes give you control over granularity, from full state snapshots after each step to token-by-token LLM output as it arrives from the provider.

Studio#

LangGraph Studio connects to any Agent Server — local or deployed — and gives you an interactive environment for developing and debugging agents. Visualize execution graphs, inspect state at any checkpoint, step through runs, modify state mid-execution, and branch to explore alternative paths.

Agent composition#

Agents don’t run in isolation. RemoteGraph lets any agent call other deployed agents using the same interface you use locally — a research agent delegates to a search agent on a different deployment, a routing agent dispatches to specialized sub-agents. The agents don’t need to know whether they’re calling something local or remote.

Native support for MCP and A2A means your deployed agents can expose and consume tool interfaces and agent-to-agent protocols alongside the broader ecosystem.

Deployment options#

Cloud — Fully managed. Push from a git repo.
Hybrid — Runs in your cloud, managed by the LangSmith control plane.
Self-hosted — Fully self-managed in your own infrastructure.

Same runtime, same APIs. What changes is who manages the infrastructure. See Platform setup for a comparison.

Go deeper#

Securing and customizing your server#

Custom auth — Authentication and multi-tenant access control
Server customization — Custom routes, middleware, lifespan hooks, encryption

Operations#

CI/CD pipelines
TTL configuration for state and thread management
Semantic search

Reference#

Agent Server — Runtime architecture reference

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Link last verified June 7, 2026. View original ↗

Source: LangChain Docs

Link last verified: 2026-03-04