Anthropic ↗
noOriginal Documentation
Install#
To use AnthropicModel models, you need to either install pydantic-ai, or install pydantic-ai-slim with the anthropic optional group:
pip install "pydantic-ai-slim[anthropic]"uv add "pydantic-ai-slim[anthropic]"Configuration#
To use Anthropic through their API, go to console.anthropic.com/settings/keys to generate an API key.
AnthropicModelName contains a list of available Anthropic models.
Environment variable#
Once you have the API key, you can set it as an environment variable:
export ANTHROPIC_API_KEY='your-api-key'You can then use AnthropicModel by name:
from pydantic_ai import Agent
agent = Agent('gateway/anthropic:claude-sonnet-4-6')
...from pydantic_ai import Agent
agent = Agent('anthropic:claude-sonnet-4-6')
...Or initialise the model directly with just the model name:
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel
model = AnthropicModel('claude-sonnet-4-5')
agent = Agent(model)
...provider argument#
You can provide a custom Provider via the provider argument:
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel
from pydantic_ai.providers.anthropic import AnthropicProvider
model = AnthropicModel(
'claude-sonnet-4-5', provider=AnthropicProvider(api_key='your-api-key')
)
agent = Agent(model)
...Custom HTTP Client#
You can customize the AnthropicProvider with a custom httpx.AsyncClient:
from httpx import AsyncClient
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel
from pydantic_ai.providers.anthropic import AnthropicProvider
custom_http_client = AsyncClient(timeout=30)
model = AnthropicModel(
'claude-sonnet-4-5',
provider=AnthropicProvider(api_key='your-api-key', http_client=custom_http_client),
)
agent = Agent(model)
...Cloud Platform Integrations#
You can use Anthropic models through cloud platforms by passing a custom client to AnthropicProvider.
AWS Bedrock#
To use Claude models via AWS Bedrock, follow the Anthropic documentation on how to set up an AsyncAnthropicBedrock client and then pass it to AnthropicProvider:
from anthropic import AsyncAnthropicBedrock
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel
from pydantic_ai.providers.anthropic import AnthropicProvider
bedrock_client = AsyncAnthropicBedrock() # Uses AWS credentials from environment
provider = AnthropicProvider(anthropic_client=bedrock_client)
model = AnthropicModel('us.anthropic.claude-sonnet-4-5-20250929-v1:0', provider=provider)
agent = Agent(model)
...Bedrock vs BedrockConverseModel
This approach uses Anthropic’s SDK with AWS Bedrock credentials. For an alternative using AWS SDK (boto3) directly, see BedrockConverseModel.
Google Vertex AI#
To use Claude models via Google Cloud Vertex AI, follow the Anthropic documentation on how to set up an AsyncAnthropicVertex client and then pass it to AnthropicProvider:
from anthropic import AsyncAnthropicVertex
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel
from pydantic_ai.providers.anthropic import AnthropicProvider
vertex_client = AsyncAnthropicVertex(region='us-east5', project_id='your-project-id')
provider = AnthropicProvider(anthropic_client=vertex_client)
model = AnthropicModel('claude-sonnet-4-5', provider=provider)
agent = Agent(model)
...Vertex vs GoogleModel
This approach uses Anthropic’s SDK with Vertex AI credentials. For an alternative using Google’s Vertex AI SDK directly, see GoogleModel.
Microsoft Foundry#
To use Claude models via Microsoft Foundry, follow the Anthropic documentation on how to set up an AsyncAnthropicFoundry client and then pass it to AnthropicProvider:
from anthropic import AsyncAnthropicFoundry
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModel
from pydantic_ai.providers.anthropic import AnthropicProvider
foundry_client = AsyncAnthropicFoundry(
api_key='your-foundry-api-key', # Or set ANTHROPIC_FOUNDRY_API_KEY
resource='your-resource-name',
)
provider = AnthropicProvider(anthropic_client=foundry_client)
model = AnthropicModel('claude-sonnet-4-5', provider=provider)
agent = Agent(model)
...See Anthropic’s Microsoft Foundry documentation for setup instructions including Entra ID authentication.
Prompt Caching#
Anthropic supports prompt caching to reduce costs by caching parts of your prompts. Pydantic AI provides four ways to use prompt caching:
- Cache User Messages with CachePoint: Insert a
CachePointmarker in your user messages to cache everything before it - Cache System Instructions: Set AnthropicModelSettings.anthropic_cache_instructions to
True(uses 5m TTL by default) or specify'5m'/'1h'directly - Cache Tool Definitions: Set AnthropicModelSettings.anthropic_cache_tool_definitions to
True(uses 5m TTL by default) or specify'5m'/'1h'directly - Cache All Messages: Set AnthropicModelSettings.anthropic_cache_messages to
Trueto automatically cache all messages
Amazon Bedrock
When using AsyncAnthropicBedrock, the TTL parameter is automatically omitted from all cache control settings (including CachePoint, anthropic_cache_instructions, anthropic_cache_tool_definitions, and anthropic_cache_messages) because Bedrock doesn’t support explicit TTL.
Example 1: Automatic Message Caching#
Use anthropic_cache_messages to automatically cache all messages up to and including the newest user message:
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModelSettings
agent = Agent(
'gateway/anthropic:claude-sonnet-4-6',
instructions='You are a helpful assistant.',
model_settings=AnthropicModelSettings(
anthropic_cache_messages=True, # Automatically caches the last message
),
)
# The last message is automatically cached - no need for manual CachePoint
result1 = agent.run_sync('What is the capital of France?')
# Subsequent calls with similar conversation benefit from cache
result2 = agent.run_sync('What is the capital of Germany?')
print(f'Cache write: {result1.usage().cache_write_tokens}')
print(f'Cache read: {result2.usage().cache_read_tokens}')from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModelSettings
agent = Agent(
'anthropic:claude-sonnet-4-6',
instructions='You are a helpful assistant.',
model_settings=AnthropicModelSettings(
anthropic_cache_messages=True, # Automatically caches the last message
),
)
# The last message is automatically cached - no need for manual CachePoint
result1 = agent.run_sync('What is the capital of France?')
# Subsequent calls with similar conversation benefit from cache
result2 = agent.run_sync('What is the capital of Germany?')
print(f'Cache write: {result1.usage().cache_write_tokens}')
print(f'Cache read: {result2.usage().cache_read_tokens}')Example 2: Comprehensive Caching Strategy#
Combine multiple cache settings for maximum savings:
from pydantic_ai import Agent, RunContext
from pydantic_ai.models.anthropic import AnthropicModelSettings
agent = Agent(
'gateway/anthropic:claude-sonnet-4-6',
instructions='Detailed instructions...',
model_settings=AnthropicModelSettings(
anthropic_cache_instructions=True, # Cache system instructions
anthropic_cache_tool_definitions='1h', # Cache tool definitions with 1h TTL
anthropic_cache_messages=True, # Also cache the last message
),
)
@agent.tool
def search_docs(ctx: RunContext, query: str) -> str:
"""Search documentation."""
return f'Results for {query}'
result = agent.run_sync('Search for Python best practices')
print(result.output)from pydantic_ai import Agent, RunContext
from pydantic_ai.models.anthropic import AnthropicModelSettings
agent = Agent(
'anthropic:claude-sonnet-4-6',
instructions='Detailed instructions...',
model_settings=AnthropicModelSettings(
anthropic_cache_instructions=True, # Cache system instructions
anthropic_cache_tool_definitions='1h', # Cache tool definitions with 1h TTL
anthropic_cache_messages=True, # Also cache the last message
),
)
@agent.tool
def search_docs(ctx: RunContext, query: str) -> str:
"""Search documentation."""
return f'Results for {query}'
result = agent.run_sync('Search for Python best practices')
print(result.output)Example 3: Fine-Grained Control with CachePoint#
Use manual CachePoint markers to control cache locations precisely:
from pydantic_ai import Agent, CachePoint
agent = Agent(
'gateway/anthropic:claude-sonnet-4-6',
instructions='Instructions...',
)
# Manually control cache points for specific content blocks
result = agent.run_sync([
'Long context from documentation...',
CachePoint(), # Cache everything up to this point
'First question'
])
print(result.output)from pydantic_ai import Agent, CachePoint
agent = Agent(
'anthropic:claude-sonnet-4-6',
instructions='Instructions...',
)
# Manually control cache points for specific content blocks
result = agent.run_sync([
'Long context from documentation...',
CachePoint(), # Cache everything up to this point
'First question'
])
print(result.output)Accessing Cache Usage Statistics#
Access cache usage statistics via result.usage():
from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModelSettings
agent = Agent(
'gateway/anthropic:claude-sonnet-4-6',
instructions='Instructions...',
model_settings=AnthropicModelSettings(
anthropic_cache_instructions=True # Default 5m TTL
),
)
result = agent.run_sync('Your question')
usage = result.usage()
print(f'Cache write tokens: {usage.cache_write_tokens}')
print(f'Cache read tokens: {usage.cache_read_tokens}')from pydantic_ai import Agent
from pydantic_ai.models.anthropic import AnthropicModelSettings
agent = Agent(
'anthropic:claude-sonnet-4-6',
instructions='Instructions...',
model_settings=AnthropicModelSettings(
anthropic_cache_instructions=True # Default 5m TTL
),
)
result = agent.run_sync('Your question')
usage = result.usage()
print(f'Cache write tokens: {usage.cache_write_tokens}')
print(f'Cache read tokens: {usage.cache_read_tokens}')Cache Point Limits#
Anthropic enforces a maximum of 4 cache points per request. Pydantic AI automatically manages this limit to ensure your requests always comply without errors.
How Cache Points Are Allocated#
Cache points can be placed in three locations:
- System Prompt: Via
anthropic_cache_instructionssetting (adds cache point to last system prompt block) - Tool Definitions: Via
anthropic_cache_tool_definitionssetting (adds cache point to last tool definition) - Messages: Via
CachePointmarkers oranthropic_cache_messagessetting (adds cache points to message content)
Each setting uses at most 1 cache point, but you can combine them.
Example: Using All 3 Cache Point Sources#
Define an agent with all cache settings enabled:
from pydantic_ai import Agent, CachePoint
from pydantic_ai.models.anthropic import AnthropicModelSettings
agent = Agent(
'gateway/anthropic:claude-sonnet-4-6',
instructions='Detailed instructions...',
model_settings=AnthropicModelSettings(
anthropic_cache_instructions=True, # 1 cache point
anthropic_cache_tool_definitions=True, # 1 cache point
anthropic_cache_messages=True, # 1 cache point
),
)
@agent.tool_plain
def my_tool() -> str:
return 'result'
# This uses 3 cache points (instructions + tools + last message)
# You can add 1 more CachePoint marker before hitting the limit
result = agent.run_sync([
'Context', CachePoint(), # 4th cache point - OK
'Question'
])
print(result.output)
usage = result.usage()
print(f'Cache write tokens: {usage.cache_write_tokens}')
print(f'Cache read tokens: {usage.cache_read_tokens}')from pydantic_ai import Agent, CachePoint
from pydantic_ai.models.anthropic import AnthropicModelSettings
agent = Agent(
'anthropic:claude-sonnet-4-6',
instructions='Detailed instructions...',
model_settings=AnthropicModelSettings(
anthropic_cache_instructions=True, # 1 cache point
anthropic_cache_tool_definitions=True, # 1 cache point
anthropic_cache_messages=True, # 1 cache point
),
)
@agent.tool_plain
def my_tool() -> str:
return 'result'
# This uses 3 cache points (instructions + tools + last message)
# You can add 1 more CachePoint marker before hitting the limit
result = agent.run_sync([
'Context', CachePoint(), # 4th cache point - OK
'Question'
])
print(result.output)
usage = result.usage()
print(f'Cache write tokens: {usage.cache_write_tokens}')
print(f'Cache read tokens: {usage.cache_read_tokens}')Automatic Cache Point Limiting#
When cache points from all sources (settings + CachePoint markers) exceed 4, Pydantic AI automatically removes excess cache points from older message content (keeping the most recent ones).
Define an agent with 2 cache points from settings:
from pydantic_ai import Agent, CachePoint
from pydantic_ai.models.anthropic import AnthropicModelSettings
agent = Agent(
'gateway/anthropic:claude-sonnet-4-6',
instructions='Instructions...',
model_settings=AnthropicModelSettings(
anthropic_cache_instructions=True, # 1 cache point
anthropic_cache_tool_definitions=True, # 1 cache point
),
)
@agent.tool_plain
def search() -> str:
return 'data'
# Already using 2 cache points (instructions + tools)
# Can add 2 more CachePoint markers (4 total limit)
result = agent.run_sync([
'Context 1', CachePoint(), # Oldest - will be removed
'Context 2', CachePoint(), # Will be kept (3rd point)
'Context 3', CachePoint(), # Will be kept (4th point)
'Question'
])
# Final cache points: instructions + tools + Context 2 + Context 3 = 4
print(result.output)
usage = result.usage()
print(f'Cache write tokens: {usage.cache_write_tokens}')
print(f'Cache read tokens: {usage.cache_read_tokens}')from pydantic_ai import Agent, CachePoint
from pydantic_ai.models.anthropic import AnthropicModelSettings
agent = Agent(
'anthropic:claude-sonnet-4-6',
instructions='Instructions...',
model_settings=AnthropicModelSettings(
anthropic_cache_instructions=True, # 1 cache point
anthropic_cache_tool_definitions=True, # 1 cache point
),
)
@agent.tool_plain
def search() -> str:
return 'data'
# Already using 2 cache points (instructions + tools)
# Can add 2 more CachePoint markers (4 total limit)
result = agent.run_sync([
'Context 1', CachePoint(), # Oldest - will be removed
'Context 2', CachePoint(), # Will be kept (3rd point)
'Context 3', CachePoint(), # Will be kept (4th point)
'Question'
])
# Final cache points: instructions + tools + Context 2 + Context 3 = 4
print(result.output)
usage = result.usage()
print(f'Cache write tokens: {usage.cache_write_tokens}')
print(f'Cache read tokens: {usage.cache_read_tokens}')Key Points:
- System and tool cache points are always preserved
- The cache point created by
anthropic_cache_messagesis always preserved (as it’s the newest message cache point) - Additional
CachePointmarkers in messages are removed from oldest to newest when the limit is exceeded - This ensures critical caching (instructions/tools) is maintained while still benefiting from message-level caching