Model providers

no

Original Documentation

Documentation Index#

Fetch the complete documentation index at: https://docs.langchain.com/llms.txt Use this file to discover all available pages before exploring further.

The LangSmith prompt playground supports a wide range of model providers. You can select a provider, configure your preferred settings, and save these configurations to reuse across multiple prompts.

Use this page for a list of the available providers and their configuration options:

Amazon Bedrock

Anthropic

Azure OpenAI

DeepSeek

Fireworks

Google Gemini

Google Vertex AI

Groq

Mistral AI

OpenAI

OpenAI compatible endpoint

XAI

For details on creating and managing model configurations, refer to the Configure prompt settings page.

Amazon Bedrock#

Before you use this model, ensure you have AWS credentials or IAM role.

Available models#

AWS Bedrock provides access to foundation models from multiple providers:

  • Anthropic: Claude models.
  • Amazon: Titan models.
  • Cohere: Command models.
  • Meta: Llama models.
  • Others: Additional providers available based on region.

For the current list of available models, refer to the AWS Bedrock documentation.

Configuration parameters#

Parameters depend on the underlying model provider:

For Anthropic models#

Uses Anthropic configuration (see Anthropic section above).

For Amazon Titan#

ParameterRangeDescription
Temperature0.0 - 1.0Response randomness
Max Tokens1+Maximum response length
Top P0.0 - 1.0Nucleus sampling

AWS-specific settings#

  • Region: AWS region for model deployment.
  • IAM Role: Use role-based authentication instead of keys.

Tool calling#

Depends on underlying model:

  • Anthropic models: auto, any.
  • Cohere models: auto.

Anthropic#

Before you use this model, ensure you have an Anthropic API key.

Available models#

Anthropic offers three tiers of models across their Claude generations:

  • Opus: Highest intelligence and capability.
  • Sonnet: Balanced performance and cost.
  • Haiku: Fast and cost-effective.

Recent Claude models support extended thinking capabilities for showing reasoning processes.

For the current list of available models, refer to the Anthropic documentation.

Configuration parameters#

ParameterRangeDefaultDescription
Temperature0.0 - 1.0OptionalRandomness control (uncheck to use model default)
Max Output Tokens1+1024Maximum response length
Top P0.0 - 1.0OptionalNucleus sampling (uncheck for model default)
Top K1+OptionalLimits to top K tokens (uncheck for model default)

Temperature, Top P, and Top K are optional. When unchecked, Claude uses its internal defaults.

Extended Thinking#

Available on supported Claude models. Enable the model to show reasoning before responding, similar to OpenAI’s o-series.

ParameterRangeDescription
Enable Extended ThinkingToggleShow/hide thinking process
Budget Tokens1+Max tokens for thinking (default: 1024)

When enabled, responses include:

  1. A “thinking” section with the model’s reasoning.
  2. The final response.

Advanced options#

  • Base URL: Override API endpoint for custom deployments.

Tool calling#

  • Supported Tool Choices: auto, any (requires at least one tool).
  • Parallel Execution: No (sequential only).

Azure OpenAI#

Before you use this model, ensure you have Azure OpenAI credentials (endpoint + API key).

Available models#

Azure OpenAI provides the same model families as OpenAI:

  • GPT series: General-purpose chat models.
  • o-series: Reasoning-focused models.
  • Legacy models: GPT-3.5 and GPT-4 variants.

Model availability varies by Azure region and requires deployment before use.

For the current list of available models, refer to the Azure OpenAI documentation.

Configuration parameters#

Azure OpenAI supports the same parameters as OpenAI:

Standard parameters#

ParameterRangeDescription
Temperature0.0 - 2.0Controls randomness. Lower = more focused, higher = more creative.
Max Output Tokens1+Maximum length of the response
Top P0.0 - 1.0Nucleus sampling threshold. Alternative to temperature.
Presence Penalty-2.0 - 2.0Penalize new topics (positive) or encourage them (negative)
Frequency Penalty-2.0 - 2.0Penalize repetition (positive) or allow it (negative)
SeedIntegerFor reproducible outputs

Advanced parameters#

Reasoning Effort: Available on reasoning-optimized models (o-series and newer GPT models).

Service Tier: Available on newer models.

Other parameters:

  • JSON Mode: Force valid JSON responses.
  • Parallel Tool Calls: Execute multiple tools concurrently.

Azure-specific features#

  • Deployment Management: Models must be deployed before use.
  • Regional Availability: Choose Azure regions for data residency.
  • Content Filtering: Built-in content moderation and safety features.
  • Managed Identity: Azure AD authentication support.
  • Private Endpoints: VNet integration for secure access.

Tool calling#

  • Supported Tool Choices: auto, required, none, or specific tool name.
  • Parallel Execution: Yes.

DeepSeek#

Before you use this model, ensure you have a DeepSeek API key.

Available models#

DeepSeek offers general-purpose models, reasoning-optimized models (R-series), and coding-specialized models.

For the current list of available models, refer to DeepSeek’s documentation.

Configuration parameters#

ParameterRangeDescription
Temperature0.0 - 2.0Response randomness
Max Tokens1+Maximum response length
Top P0.0 - 1.0Nucleus sampling
Presence Penalty-2.0 - 2.0
Frequency Penalty-2.0 - 2.0

Fireworks#

Before you use this model, ensure you have a Fireworks API key.

Available models#

Fireworks provides high-speed inference for popular open-source models and fine-tuned variants, including:

  • Llama: Meta’s Llama models in various sizes.
  • Mixtral: Mistral’s mixture-of-experts models.
  • Qwen: Alibaba’s multilingual models.
  • DeepSeek: DeepSeek models.
  • Other open models: Gemma, Phi, and more.

For the current list of available models, refer to Fireworks’ model documentation.

Configuration parameters#

ParameterRangeDescription
Temperature0.0 - 2.0Response randomness
Max Tokens1+Maximum response length
Top P0.0 - 1.0Nucleus sampling

Tool calling#

  • Supported Tool Choices: auto, required, none.
  • Parallel Execution: Yes.

Google Gemini#

Before you use this model, ensure you have a Google AI API key.

Available models#

Google offers Gemini models in multiple tiers (Ultra, Pro, Flash) optimized for different use cases.

For the current list of available models, refer to Google’s Gemini documentation.

Configuration parameters#

ParameterRangeDescription
Temperature0.0 - 2.0Response randomness
Max Output Tokens1+Maximum response length
Top P0.0 - 1.0Nucleus sampling
Top K1+Top-k sampling

Tool calling#

  • Supported Tool Choices: auto, any, none.
  • Parallel Execution: No.

Google Vertex AI#

Before you use this model, ensure you have Google Cloud credentials.

Available models#

Google offers Gemini models in multiple tiers (Ultra, Pro, Flash) optimized for different use cases, plus other models available through Vertex AI.

For the current list of available models, refer to the Vertex AI documentation.

Configuration parameters#

ParameterRangeDescription
Temperature0.0 - 2.0Response randomness
Max Output Tokens1+Maximum response length
Top P0.0 - 1.0Nucleus sampling
Top K1+Top-k sampling

Advanced options#

  • Region Selection: Deploy in specific Google Cloud regions.
  • Safety Settings: Configure content filtering thresholds.

Tool calling#

  • Supported Tool Choices: auto, any, none.
  • Parallel Execution: No.

Groq#

Before you use this model, ensure you have a Groq API key.

Available models#

Groq provides high-speed inference for popular open-source models including Llama, Mixtral, and Gemma variants.

For the current list of available models, refer to Groq’s model documentation.

Configuration parameters#

ParameterRangeDescription
Temperature0.0 - 2.0Response randomness
Max Tokens1+Maximum response length

Tool calling#

  • Supported Tool Choices: auto, required, none.
  • Parallel Execution: Yes.

Mistral AI#

Before you use this model, ensure you have a Mistral AI API key.

Available models#

Mistral offers models in multiple tiers (Large, Medium, Small) optimized for different performance and cost requirements.

For the current list of available models, refer to Mistral’s documentation.

Configuration parameters#

ParameterRangeDescription
Temperature0.0 - 1.0Response randomness
Max Tokens1+Maximum response length
Top P0.0 - 1.0Nucleus sampling

Tool calling#

  • Supported Tool Choices: auto, any, none.
  • Parallel Execution: No.

OpenAI#

Before you use this model, ensure you have an OpenAI API key or Azure OpenAI credentials.

Available models#

OpenAI offers several model families with different capabilities and price points:

  • GPT series: General-purpose chat models with various size/capability tiers.
  • o-series: Reasoning-focused models optimized for complex problem-solving.
  • Legacy models: Older GPT-3.5 and GPT-4 variants.

For the current list of available models, refer to the OpenAI documentation.

Configuration parameters#

Standard:

ParameterRangeDescription
Temperature0.0 - 2.0Controls randomness. Lower = more focused, higher = more creative.
Max Output Tokens1+Maximum length of the response
Top P0.0 - 1.0Nucleus sampling threshold. Alternative to temperature.
Presence Penalty-2.0 - 2.0Penalize new topics (positive) or encourage them (negative)
Frequency Penalty-2.0 - 2.0Penalize repetition (positive) or allow it (negative)
SeedIntegerFor reproducible outputs

Advanced:

Reasoning Effort: Available on reasoning-optimized models (o-series and newer GPT models).

Controls reasoning depth before responding. Higher effort = better quality for complex tasks, longer latency.

ValueDescription
noneDisables reasoning (standard chat behavior)
minimalMinimal reasoning
lowLight reasoning
mediumModerate reasoning (default)
highDeep reasoning
xhighExtra deep reasoning (if supported by model)

When reasoning_effort is active (not none), temperature, top_p, and penalties are automatically disabled.

Service Tier: Available on newer models.

Controls request priority and processing allocation.

ValueDescription
autoSystem decides based on load (default)
defaultStandard processing queue
flexLower cost, variable latency (if supported by model)
priorityHigh-priority queue, lower latency, higher cost

Other parameters:

  • JSON Mode: Force valid JSON responses.
  • Responses API: Improved streaming (default: enabled).
  • Parallel Tool Calls: Execute multiple tools concurrently.

Tool calling#

  • Supported Tool Choices: auto, required, none, or specific tool name
  • Parallel Execution: Yes

OpenAI Compatible Endpoint#

Authentication varies by endpoint (often API key or none).

Configuration#

Required:

  • Base URL: Your endpoint URL (e.g., https://your-endpoint.com/v1).
  • Model Name: Your model identifier.

Works with any framework or service that implements the OpenAI-compatible API format, including:

  • Self-hosted open-source inference servers
  • Model routing proxies
  • Custom model endpoints

Configuration parameters#

All OpenAI-compatible parameters:

ParameterRangeDescription
Temperature0.0 - 2.0Response randomness
Max Tokens1+Maximum response length
Top P0.0 - 1.0Nucleus sampling
Frequency Penalty-2.0 - 2.0Reduce repetition
Presence Penalty-2.0 - 2.0Encourage new topics

Advanced:

  • JSON Mode: If endpoint supports it.
  • Streaming: If endpoint supports it.
  • Function Calling: If endpoint implements OpenAI format.

Tool calling#

  • Supported Tool Choices: auto, required, none (if endpoint supports).
  • Parallel Execution: Yes (if endpoint supports).

Example endpoints#

Local Ollama:

Base URL: http://localhost:11434/v1
Model: llama3.1

vLLM Server:

Base URL: https://your-server.com/v1
Model: mistral-7b-instruct

LiteLLM Proxy:

Base URL: https://litellm.example.com
Model: gpt-4 (routes to configured backend)

XAI#

Before you use this model, ensure you have an xAI API key.

Available models#

xAI offers Grok models in multiple sizes for different use cases.

For the current list of available models, refer to xAI’s documentation.

Configuration parameters#

Standard OpenAI-compatible parameters:

ParameterRangeDescription
Temperature0.0 - 2.0Response randomness
Max Tokens1+Maximum response length
Top P0.0 - 1.0Nucleus sampling
Presence Penalty0 - 2.0Hidden on reasoning models
Frequency Penalty0 - 2.0Hidden on reasoning models

Tool calling#

  • Supported Tool Choices: OpenAI-compatible.
  • Parallel Execution: Yes (if supported).

Common Configuration Across All Providers#

Extra Parameters#

All providers support a JSON editor for extra parameters not exposed in the UI:

{
  "logprobs": true,
  "top_logprobs": 5,
  "custom_parameter": "value"
}

Use cases:

  • Provider-specific beta features
  • Advanced parameters not yet in UI
  • Custom metadata for tracking

Limitation: Cannot override parameters already in the UI (e.g., can’t set temperature here if it’s set above)

Rate Limiting#

Requests Per Second (RPS) - Available for all providers when running over datasets:

  • Range: 0 - 500 RPS
  • Purpose: Respect API rate limits, control costs
  • Default: Varies by provider

Set this when running experiments or evaluations to avoid hitting rate limits.

Next steps#

Learn how to create and manage model configurations in the playground.

Get started building prompts with your chosen model provider.


Edit this page on GitHub or file an issue.

Connect these docs to Claude, VSCode, and more via MCP for real-time answers.

Link last verified June 7, 2026. View original ↗
Source: LangChain Docs
Link last verified: 2026-03-04