Going to production ↗

langchain guide advanced agents memory deployment workflows

Summary: Take your Deep Agent to production with persistent memory, sandboxes, resilience middleware, and deployment options

Original Documentation

Documentation Index#
Fetch the complete documentation index at: https://docs.langchain.com/llms.txt Use this file to discover all available pages before exploring further.

Take your Deep Agent to production with persistent memory, sandboxes, resilience middleware, and deployment options

This guide covers considerations for taking a Deep Agent from a local prototype to a production deployment. It walks through scoping memory, configuring execution environments, adding guardrails, and connecting a frontend.

Overview#

Agents use information from memory and their execution environment to accomplish tasks. In production, there are a few primitives that determine how information is shared and accessed:

Thread: a single conversation. Message history and scratch files are scoped to the thread by default and don’t carry over.
User: someone interacting with your agent. Memory and files can be private to a user or shared across users. Identity and authorization comes from your auth layer.
Assistant: a configured agent instance. Memory and files can be tied to one assistant or shared across all of them.

This page covers:

LangSmith Deployments: managed infrastructure with auth, webhooks, and cron
Production considerations: multi-tenancy, authentication, credentials, async, and durability
Memory: persist information across conversations
Execution environment: file storage and code execution
Guardrails: rate limiting, error handling, and data privacy
Frontend: connect your UI to a deployed agent

LangSmith Deployments#

The fastest way to get a Deep Agent into production is LangSmith Deployments. It provisions the infrastructure your agent needs: assistants, threads, runs, a store, and a checkpointer, so you don’t have to set these up yourself. It also gives you authentication, webhooks, cron jobs, and observability out of the box, and can expose your agent via MCP or A2A.

For setup instructions, see the LangSmith Deployments quickstart.

All code snippets on this page use the following langgraph.json unless otherwise specified:

{
  "dependencies": ["."],
  "graphs": {
    "agent": "./src/agent.ts:agent"
  },
  "env": ".env"
}

langgraph.json is the configuration file that tells the LangGraph platform how to build and run your application. It lives at the root of your project and is required for both local development (with langgraph dev) and production deployment. The key fields are:

Field	Description
`dependencies`	Packages to install. `["."]` installs the current directory as a package (reads from `requirements.txt`, `pyproject.toml`, or `package.json`).
`graphs`	Maps graph IDs to their code locations. Each entry is `"<id>": "./<file>:<variable>"`, where `<id>` is the name you use to invoke the graph via the API, and `<variable>` is the compiled graph or constructor function exported from `<file>`.
`env`	Path to a `.env` file with environment variables (API keys, secrets). These are set at build time and available at runtime.

For the full set of configuration options (custom Docker steps, store indexing, auth handlers, and more), see application structure.

Production considerations#

Multi-tenancy#

When your agent serves multiple users, you need to handle three concerns: verifying who each user is, controlling what they can access, and managing the credentials the agent uses to act on their behalf.

User identity and access control#

LangSmith Deployments supports custom authentication to establish user identity and authorization handlers to control access to resources like threads, assistants, and store namespaces. Authorization handlers run after authentication succeeds and can:

Tag resources with ownership metadata (e.g., owner: user_id)
Return filters so users only see their own resources
Deny access with HTTP 403 for unauthorized operations

For a step-by-step tutorial, see Make conversations private.

How you scope memory and execution environments determines what data is shared between users. See the sections below for details.

Team access control (RBAC)#

LangSmith’s role-based access control governs who on your team can deploy, configure, and monitor agents. This is separate from end-user authorization above.

Role	Access
Workspace Admin	Full permissions including settings and member management
Workspace Editor	Create and modify resources, but cannot delete runs or manage members
Workspace Viewer	Read-only access

Custom roles with granular permissions are available on Enterprise plans. See the RBAC reference for the full permission model.

End-user credentials#

When your agent needs to call external APIs on behalf of a user (e.g., reading their GitHub repos, sending Slack messages, querying their data warehouse), you need a way to pass the user’s credentials through to the agent without hardcoding them.

OAuth via Agent Auth. Agent Auth provides a managed OAuth 2.0 flow. Configure an OAuth provider, and the agent can request tokens scoped to each user. On first use, the agent interrupts execution and presents an OAuth consent URL. After the user authenticates, the agent resumes with a valid token. Tokens are stored and refreshed automatically.


const authClient = new Client();

// Inside your agent's tool:
const authResult = await authClient.authenticate({
  provider: "github",
  scopes: ["repo", "read:org"],
  userId: config.configurable.langgraph_auth_user_id,
});
// Use authResult.token for GitHub API calls on the user's behalf

Credential injection for sandboxes. If your agent runs code inside a sandbox that calls external APIs, the sandbox auth proxy can inject credentials into outbound requests automatically, so sandbox code never receives raw API keys. See Managing secrets for setup details.

Workspace secrets. For API keys shared across all users (for example your organization’s LLM provider keys, search API keys), store them as workspace secrets in LangSmith. See Managing secrets for details.

Async#

LLM-based applications are heavily I/O-bound: calling language models, databases, and external services. Async programming lets these operations run concurrently instead of blocking, improving throughput and responsiveness.

LangChain follows the convention of prefixing a to async method names (e.g., ainvoke, abefore_agent, astream). Sync and async variants live in the same class or namespace.

When building for production:

Create async tools. LangChain runs sync tools in a separate thread to avoid blocking, but native async avoids the threading overhead entirely.
Use async middleware methods. Custom middleware should implement async hooks (e.g., abefore_agent instead of before_agent).
Use async for external resource lifecycle. Creating sandboxes or connecting to MCP servers involves network calls and should be awaited. This is why graph factories that provision these resources are async.

Durability#

Deep Agents run on LangGraph, which provides durable execution out of the box. The persistence layer checkpoints state at each step, so a run interrupted by a failure, timeout, or human-in-the-loop pause resumes from its last recorded state without reprocessing previous steps. For long-running deep agents that spawn many subagents, this means a mid-run failure doesn’t lose completed work.

Checkpointing also enables:

Indefinite interrupts. Human-in-the-loop workflows can pause for minutes or days and resume exactly where they left off.
Time travel. Every checkpointed step is a snapshot you can rewind to, letting you replay from an earlier state if something goes wrong.
Safe handling of sensitive operations. For workflows involving payments or other irreversible actions, checkpoints provide an audit trail and a recovery point to inspect the exact state that led to an action.

LangSmith Deployments configure a persistent checkpointer automatically. If you are self-hosting, see persistence for setup instructions.

Memory#

Without memory, every conversation starts from scratch. Memory lets your agent retain information across conversations (user preferences, learned instructions, past experiences) so it can personalize its behavior over time. For an overview of memory types, see the memory concepts guide.

Scoping#

Memory is always persistent across conversations. The main question is how it’s scoped across user and assistant boundaries. The right scope depends on who should see and modify the data:

Scope	Namespace	Use case	Example
User (recommended default)	`(user_id)`	Per-user preferences and context	“I prefer concise responses”
Assistant	`(assistant_id)`	Shared instructions for one assistant	“Cap posts at 280 characters”
Global	`(org_id)`	Read-only policies for all users and assistants	“Never disclose internal pricing”

Shared memory (assistant, user, or organization scope) is a vector for prompt injection. If one user can write to memory that another user’s conversation reads, a malicious user could inject instructions into that shared state. Enforce read-only access where appropriate. For example, make organization-wide policies writable only through application code, not by the agent itself.

Configuration#

In Deep Agents, memory is stored as files in a virtual filesystem. By default, files only last for a single conversation. To persist them, route a path like /memories/ to a StoreBackend that writes to the LangGraph Store. Use a CompositeBackend to give the agent both ephemeral scratch space and persistent long-term memory.

Namespace by user_id. Each user gets their own private memory. This is the recommended default since most applications deploy a single assistant.

    import { createDeepAgent, CompositeBackend, StateBackend, StoreBackend } from "deepagents";

    const agent = createDeepAgent({
      backend: (rt) => new CompositeBackend(
        new StateBackend(rt),
        {
          "/memories/": new StoreBackend(rt, {
            namespace: (ctx) => [ctx.runtime.context.userId],
          }),
        },
      ),
      systemPrompt: `You have persistent memory at /memories/.

      Read /memories/instructions.txt at the start of each conversation for
      accumulated knowledge and preferences. When you learn something that
      should persist, update that file.`,
    });

    export { agent };
    ```
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="Assistant"></span>
Namespace by `assistant_id`. Memory is shared across all users of the same assistant, so any user can read or update it. Use this for shared instructions or knowledge that applies to everyone using a given assistant (e.g., "always reply in formal tone").

```typescript
    import { getConfig } from "@langchain/langgraph";
    import { createDeepAgent, CompositeBackend, StateBackend, StoreBackend } from "deepagents";

    const agent = createDeepAgent({
      backend: (rt) => new CompositeBackend(
        new StateBackend(rt),
        {
          "/memories/": new StoreBackend(rt, {
            namespace: (ctx) => {
              const config = getConfig();
              return [config.metadata.assistantId];
            },
          }),
        },
      ),
    });

    export { agent };
    ```
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="Global"></span>
Namespace by `org_id`. Memory is shared across all users and all assistants. Typically used for organization-wide policies (compliance rules, brand guidelines) that should be read-only for the agent. Write access should be restricted to application code to prevent prompt injection.

```typescript
    import { createDeepAgent, CompositeBackend, StateBackend, StoreBackend } from "deepagents";

    const agent = createDeepAgent({
      backend: (rt) => new CompositeBackend(
        new StateBackend(rt),
        {
          "/memories/": new StoreBackend(rt, {
            namespace: (ctx) => [ctx.runtime.context.orgId],
          }),
        },
      ),
    });

    export { agent };
    ```
  <span class="tab-end"></span>
<span class="tab-group-end"></span>

You can also read and write to the store from your application code using the [Store API](/langsmith/custom-store). See [accessing memories from external code](/oss/javascript/deepagents/memory#accessing-memories-from-external-code) for examples.

For the full namespace factory API, see [namespace factories](/oss/javascript/deepagents/backends#namespace-factories). For memory patterns like self-improving instructions and knowledge bases, see [long-term memory](/oss/javascript/deepagents/memory).

## Execution environment

Locally, agents can read and write files on disk and run shell commands directly. In production, you need to think about isolation and persistence. The right setup depends on whether your agent needs to execute code:

* **Filesystem backends** are enough if your agent only reads and writes files. Choose a backend that matches your persistence needs: ephemeral scratch space, persistent storage, or a mix of both.
* **Sandboxes** add an isolated container with an `execute` tool for running shell commands. Use a sandbox if your agent needs to run code, install packages, or do anything beyond file I/O.

### Filesystem

Choose a backend based on what needs to persist:

* [StateBackend](https://reference.langchain.com/javascript/deepagents/backends/StateBackend) (default): ephemeral scratch space, scoped to a single conversation. Checkpointed at every step, so avoid writing large files.
* [StoreBackend](https://reference.langchain.com/javascript/deepagents/backends/StoreBackend): persistent storage that survives across conversations. Scope with a [namespace factory](/oss/javascript/deepagents/backends#namespace-factories).
* [CompositeBackend](https://reference.langchain.com/javascript/deepagents/backends/CompositeBackend): mix both. Ephemeral scratch space by default with persistent routes for specific paths like `/memories/`.

For the full list of backends and how to build custom ones, see [backends](/oss/javascript/deepagents/backends).

<span class="callout-start" data-callout-type="warning"></span>
  `FilesystemBackend` and `LocalShellBackend` access the host directly. Don't use them in deployed agents.
<span class="callout-end"></span>

### Sandboxes

If your agent needs to run code (not just read and write files), use a [sandbox](/oss/javascript/deepagents/sandboxes). Sandboxes provide both a filesystem and an `execute` tool for running shell commands, all inside an isolated container. This isolation also protects your host: if the agent's code exhausts memory or crashes, only the sandbox is affected. Your server keeps running.

#### Lifecycle

The key decision is how long a sandbox lives. Does each conversation get a fresh one, or do conversations share a persistent environment?

| Scope                | Sandbox ID stored on                      | Lifecycle                                 | Example use case                                                     |
| -------------------- | ----------------------------------------- | ----------------------------------------- | -------------------------------------------------------------------- |
| **Thread-scoped**    | [Thread](/langsmith/use-threads) metadata | Fresh per conversation, cleaned up on TTL | A data analysis bot where each conversation starts clean             |
| **Assistant-scoped** | [Assistant](/langsmith/assistants) config | Shared across all conversations           | A coding assistant that maintains a cloned repo across conversations |

<span class="callout-start" data-callout-type="note"></span>
  The examples below use an async [graph factory](/langsmith/graph-rebuild) instead of a static graph because the sandbox needs the `thread_id` or `assistant_id` from the runtime config to look up or create the correct sandbox. A graph factory receives the config on each run, so it can resolve the sandbox before building the agent. The factory is async because sandbox creation is an I/O-bound operation that requires runtime information like `thread_id` or `assistant_id` that is only available at invocation time.
<span class="callout-end"></span>

<span class="tab-group-start"></span>
  <span class="tab-start" data-tab-title="Thread-scoped (most common)"></span>
Each conversation gets its own sandbox. The [graph factory](/langsmith/graph-rebuild) reads `thread_id` from the config, so each [thread](/langsmith/use-threads) automatically gets its own isolated environment. The provider's label-based lookup handles deduplication across runs. Cleaned up when the sandbox [TTL](/langsmith/configure-ttl) expires.

```typescript
    import { Daytona } from "@daytonaio/sdk";
    import { DaytonaSandbox } from "@langchain/daytona";
    import { createDeepAgent } from "deepagents";
    import type { RunnableConfig } from "@langchain/core/runnables";

    const client = new Daytona();

    export async function agent(config: RunnableConfig) {
      const threadId = config.configurable!.thread_id;
      let sandbox;
      try {
        sandbox = await client.findOne({ labels: { thread_id: threadId } });
      } catch {
        sandbox = await client.create({
          labels: { thread_id: threadId },
          autoDeleteInterval: 3600, // TTL: clean up when idle
        });
      }
      return createDeepAgent({ backend: await DaytonaSandbox.fromId(sandbox.id) });
    }
    ```
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="Assistant-scoped"></span>
All conversations share one sandbox. The [graph factory](/langsmith/graph-rebuild) reads the [assistant](/langsmith/assistants) ID from config metadata, so every thread on the same assistant returns to the same environment. Files, installed packages, and cloned repositories persist across conversations.

```typescript
    import { Daytona } from "@daytonaio/sdk";
    import { DaytonaSandbox } from "@langchain/daytona";
    import { createDeepAgent } from "deepagents";
    import type { RunnableConfig } from "@langchain/core/runnables";

    const client = new Daytona();

    export async function agent(config: RunnableConfig) {
      const assistantId = config.metadata!.assistant_id;
      let sandbox;
      try {
        sandbox = await client.findOne({ labels: { assistant_id: assistantId } });
      } catch {
        sandbox = await client.create({ labels: { assistant_id: assistantId } });
      }
      return createDeepAgent({ backend: await DaytonaSandbox.fromId(sandbox.id) });
    }
    ```

<span class="callout-start" data-callout-type="warning"></span>
  Assistant-scoped sandboxes accumulate files, installed packages, and other in-sandbox state over time. Configure a TTL with your sandbox provider, use snapshots to reset periodically, or implement cleanup logic to prevent the sandbox's disk and memory from growing unbounded.
<span class="callout-end"></span>
  <span class="tab-end"></span>
<span class="tab-group-end"></span>

Because the `agent` variable is an async function (not a compiled graph), the server treats it as a [graph factory](/langsmith/graph-rebuild) and calls it on each run, injecting the config. The factory looks up or creates the sandbox via the provider's label-based search and returns a fresh agent graph wired to that sandbox.

Once deployed with `langgraph deploy`, invoke the agent from your application code using the SDK. The client-side code is the same regardless of scope. The scoping is handled entirely in the agent factory above, but the behavior differs:

<span class="tab-group-start"></span>
  <span class="tab-start" data-tab-title="Thread-scoped"></span>
Each thread gets its own sandbox. Follow-up messages within the same thread reuse the same sandbox, but a new thread always starts fresh with no leftover files or installed packages from previous conversations.

```typescript
    import { Client } from "@langchain/langgraph-sdk";

    const client = new Client({ apiUrl: "<DEPLOYMENT_URL>", apiKey: "<LANGSMITH_API_KEY>" });

    // Conversation 1: install pandas and analyze data
    const thread1 = await client.threads.create();
    for await (const chunk of client.runs.stream(
      thread1.thread_id,
      "agent",
      { input: { messages: [{ role: "human", content: "Install pandas and analyze sales_data.csv" }] } },
    )) {
      console.log(chunk.data);
    }

    // Follow-up in the same conversation — pandas is still installed
    for await (const chunk of client.runs.stream(
      thread1.thread_id,
      "agent",
      { input: { messages: [{ role: "human", content: "Now plot the results" }] } },
    )) {
      console.log(chunk.data);
    }

    // Conversation 2: fresh sandbox — pandas is NOT installed, no files from conversation 1
    const thread2 = await client.threads.create();
    for await (const chunk of client.runs.stream(
      thread2.thread_id,
      "agent",
      { input: { messages: [{ role: "human", content: "What packages are installed?" }] } },
    )) {
      console.log(chunk.data);
    }
    ```
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="Assistant-scoped"></span>
All threads share one sandbox. This is useful when the sandbox has state that's expensive to recreate, such as a cloned repo, installed dependencies, or build artifacts. Any conversation on the same assistant picks up where the last one left off without repeating setup.

```typescript
    import { Client } from "@langchain/langgraph-sdk";

    const client = new Client({ apiUrl: "<DEPLOYMENT_URL>", apiKey: "<LANGSMITH_API_KEY>" });

    // Conversation 1: clone and set up the project
    const thread1 = await client.threads.create();
    for await (const chunk of client.runs.stream(
      thread1.thread_id,
      "agent",
      { input: { messages: [{ role: "human", content: "Clone https://github.com/org/repo and install dependencies" }] } },
    )) {
      console.log(chunk.data);
    }

    // Conversation 2: repo and dependencies are still there
    const thread2 = await client.threads.create();
    for await (const chunk of client.runs.stream(
      thread2.thread_id,
      "agent",
      { input: { messages: [{ role: "human", content: "Run the test suite and fix any failures" }] } },
    )) {
      console.log(chunk.data);
    }
    ```
  <span class="tab-end"></span>
<span class="tab-group-end"></span>

#### File transfers

Sandboxes are isolated containers, so your application code can't directly access files inside them. Use `upload_files()` and `download_files()` to move data across the sandbox boundary:

* **Seed the sandbox before the agent runs**: upload user files, [skill](/oss/javascript/deepagents/skills) scripts, configuration, or [persistent memories](/oss/javascript/deepagents/memory) so the agent has what it needs from the start
* **Retrieve results after the agent finishes**: download generated artifacts (reports, plots, exports) and sync updated memories back for future conversations

For provider-specific file transfer examples, see [working with files](/oss/javascript/deepagents/sandboxes#working-with-files). For provider setup, security, and lifecycle patterns, see the full [sandboxes guide](/oss/javascript/deepagents/sandboxes).

<Accordion title="Example: syncing skills and memories with custom middleware">
  [Skill](/oss/javascript/deepagents/skills) scripts that the agent needs to execute must be uploaded into the sandbox before the agent runs. You may also want to sync [memories](/oss/javascript/deepagents/memory) so the agent can read and update them inside the container. Use [custom middleware](/oss/javascript/langchain/middleware/custom) with `before_agent` and `after_agent` hooks to move files across the sandbox boundary:

  ```typescript
  import { createMiddleware } from "langchain";
  import { createDeepAgent, CompositeBackend, StoreBackend } from "deepagents";
  import { DaytonaSandbox } from "@langchain/daytona";

  function safeFilename(key: string): string {
    const name = key.split("/").pop()!;
    if (name.includes("..") || /[*?]/.test(name)) {
      throw new Error(`Invalid key: ${key}`);
    }
    return name;
  }

  const createSandboxSyncMiddleware = (backend: CompositeBackend) => {
    return createMiddleware({
      name: "SandboxSyncMiddleware",
      beforeAgent: async (state, runtime) => {
        // Upload skill scripts and memories into the sandbox
        const userId = runtime.context.userId;
        const store = runtime.store;
        const encoder = new TextEncoder();
        const files: [string, Uint8Array][] = [];
        for (const item of await store.search(["skills", userId])) {
          const name = safeFilename(item.key);
          files.push([`/skills/${name}`, encoder.encode(item.value.content)]);
        }
        for (const item of await store.search(["memories", userId])) {
          const name = safeFilename(item.key);
          files.push([`/memories/${name}`, encoder.encode(item.value.content)]);
        }
        if (files.length > 0) {
          await backend.uploadFiles(files);
        }
      },
      afterAgent: async (state, runtime) => {
        // Sync updated memories back to the store
        const userId = runtime.context.userId;
        const store = runtime.store;
        const items = await store.search(["memories", userId]);
        const results = await backend.downloadFiles(
          items.map((item) => `/memories/${item.key}`),
        );
        const decoder = new TextDecoder();
        for (const result of results) {
          if (result.content) {
            await store.put(
              ["memories", userId],
              result.path.split("/").pop()!,
              { content: decoder.decode(result.content) },
            );
          }
        }
      },
    });
  };

  const backend = new CompositeBackend(
    await DaytonaSandbox.fromId(sandbox.id),
    {
      "/skills/": new StoreBackend(rt, {
        namespace: (ctx) => ["skills", ctx.runtime.context.userId],
      }),
      "/memories/": new StoreBackend(rt, {
        namespace: (ctx) => ["memories", ctx.runtime.context.userId],
      }),
    },
  );

  const agent = createDeepAgent({
    backend,
    middleware: [createSandboxSyncMiddleware(backend)],
  });

  export { agent };

Managing secrets#

Sandboxes are isolated containers, so environment variables from your host aren’t available inside them. There are two ways to provide API keys and other secrets to sandbox code:

Auth proxy (recommended). The sandbox auth proxy intercepts outbound requests from the sandbox and injects authentication headers automatically. Sandbox code calls external APIs normally, and the proxy adds the correct credentials based on the destination host. This means API keys never appear in sandbox code, environment variables, or logs.

{
  "proxy_config": {
    "rules": [
      {
        "name": "openai-api",
        "match_hosts": ["api.openai.com"],
        "inject_headers": {
          "Authorization": "Bearer ${OPENAI_API_KEY}"
        }
      },
      {
        "name": "anthropic-api",
        "match_hosts": ["api.anthropic.com"],
        "inject_headers": {
          "x-api-key": "${ANTHROPIC_API_KEY}"
        }
      }
    ]
  }
}

The ${SECRET_KEY} references resolve against secrets stored in your LangSmith workspace settings. Configure secrets there before creating a template that references them.

Workspace secrets. For API keys that don’t need proxy-based injection (e.g., keys used by the agent server itself, not sandbox code), store them as workspace secrets in LangSmith. These are available as environment variables at runtime for all agents in the workspace.

Avoid passing secrets into sandboxes via environment variables or file uploads. Agents can read any accessible file or environment variable inside the sandbox, including credentials. The auth proxy keeps secrets out of the sandbox entirely.

Guardrails#

Agents in production run autonomously, which means they can loop indefinitely, hit rate limits, or process user data that contains sensitive information. Deep Agents support middleware that wraps model and tool calls to handle these concerns.

Rate limiting#

Rate limiting here refers to capping the agent’s own LLM and tool usage within a run, not API gateway rate limiting for incoming requests.

Without limits, a confused agent can burn through your LLM API budget in minutes by looping on the same tool call or making hundreds of model calls. Set caps on both model calls and tool executions per run:


const agent = createAgent({
  model: "claude-sonnet-4-6",
  middleware: [
    modelCallLimitMiddleware({ runLimit: 50 }),
    toolCallLimitMiddleware({ runLimit: 200 }),
  ],
});

Use run_limit to cap calls within a single invocation (resets each turn). Use thread_limit to cap calls across an entire conversation (requires a checkpointer). See ModelCallLimitMiddleware and ToolCallLimitMiddleware for the full configuration.

Handling errors#

Not all errors should be handled the same way. Transient failures (network timeouts, rate limits) should be retried automatically. Errors the LLM can recover from (bad tool output, parsing failures) should be fed back to the model. Errors that need human input should pause the agent. For the full breakdown with code examples, see Handle errors appropriately.

Middleware handles the transient case. Model calls and tool calls each have their own retry middleware with exponential backoff. If your primary model provider goes down entirely, the fallback middleware switches to an alternative:


const agent = createAgent({
  model: "claude-sonnet-4-6",
  middleware: [
    // Retry model calls on rate limits, timeouts, and 5xx errors
    modelRetryMiddleware({ maxRetries: 3, backoffFactor: 2.0, initialDelayMs: 1000 }),
    // If the primary model is fully down, fall back to an alternative
    modelFallbackMiddleware("gpt-4.1"),
    // Retry specific tools that hit external APIs (not all tools)
    toolRetryMiddleware({
      maxRetries: 2,
      tools: ["search", "fetch_url"],
      retryOn: [TimeoutError, TypeError],
    }),
  ],
});

Scope ToolRetryMiddleware to specific tools rather than retrying everything. A filesystem read_file that fails won’t benefit from a retry, but a web search that times out probably will. See ModelRetryMiddleware and ModelFallbackMiddleware for the full configuration.

Data privacy#

If your agent processes user input that might contain emails, credit card numbers, or other PII, you can detect and handle it before it reaches the model or gets stored in logs:


const agent = createAgent({
  model: "claude-sonnet-4-6",
  middleware: [
    piiMiddleware("email", { strategy: "redact", applyToInput: true }),
    piiMiddleware("credit_card", { strategy: "mask", applyToInput: true }),
  ],
});

Strategies include redact (replace with [REDACTED_EMAIL]), mask (partial masking like ****-****-****-1234), hash (deterministic hash), and block (raise an error). You can also write custom detectors for domain-specific patterns. See @[PIIMiddleware] for the full configuration.

For the complete list of available middleware, see prebuilt middleware.

Frontend#

Deep Agents use useStream to connect your UI to the agent backend. useStream is a frontend hook (available for React, Vue, Svelte, and Angular) that streams messages, subagent progress, and custom state from your agent in real time.

Locally, useStream points at http://localhost:2024. In production, point it at your LangSmith Deployment and configure reconnection so users don’t lose progress if their connection drops.


function App() {
  const stream = useStream<typeof agent>({
    apiUrl: "https://your-deployment.langsmith.dev",
    assistantId: "agent",
    reconnectOnMount: true,    // Resume stream after page refresh or navigation
    fetchStateHistory: true,   // Load full thread history on mount
  });
}

reconnectOnMount picks up an in-progress run automatically. If a user refreshes while the agent is working, they’ll see it continue rather than a blank screen. fetchStateHistory loads the full conversation history for the thread, so returning users see previous messages.

For deep agent workflows that spawn many subagents, set a high recursionLimit when submitting to avoid cutting off long-running executions:

stream.submit(
  { messages: [{ type: "human", content: text }] },
  {
    streamSubgraphs: true,
    config: { recursionLimit: 10000 },
  },
);

For UI patterns specific to deep agents, such as subagent cards, todo lists, and custom state rendering, see the frontend guide.

Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Link last verified June 7, 2026. View original ↗

Source: LangChain Docs

Link last verified: 2026-04-05