LlamaIndex and Cohere's Models ↗

cohere guide intermediate models

Summary: Learn how to use Cohere and LlamaIndex together to generate responses based on data.

Original Documentation

title: LlamaIndex and Cohere’s Models slug: docs/llamaindex hidden: false description: >- Learn how to use Cohere and LlamaIndex together to generate responses based on data. image: type: fileId value: ‘https://files.buildwithfern.com/cohere.docs.buildwithfern.com/8ba30b46486ea7bfab24f3e8856d7411d1b745b26e9026abff3ee62af52ce268/assets/images/15aceed-cohere_meta_image.jpg' keywords: ’embeddings, LlamaIndex’ createdAt: ‘Fri Feb 02 2024 14:52:13 GMT+0000 (Coordinated Universal Time)’ updatedAt: ‘Tue May 14 2024 15:10:49 GMT+0000 (Coordinated Universal Time)’#

Prerequisite#

To use LlamaIndex and Cohere, you will need:

LlamaIndex Package. To install it, run:
- pip install llama-index
- pip install llama-index-llms-cohere (to use the Command models)
- pip install llama-index-embeddings-cohere (to use the Embed models)
- pip install llama-index-postprocessor-cohere-rerank (to use the Rerank models)
Cohere’s SDK. To install it, run pip install cohere. If you run into any issues or want more details on Cohere’s SDK, see this wiki.
A Cohere API Key. For more details on pricing see this page. When you create an account with Cohere, we automatically create a trial API key for you. This key will be available on the dashboard where you can copy it, and it’s in the dashboard section called “API Keys” as well.

Cohere Chat with LlamaIndex#

To use Cohere’s chat functionality with LlamaIndex create a Cohere model object and call the chat function.

from llama_index.llms.cohere import Cohere
from llama_index.core.llms import ChatMessage

cohere_model = Cohere(
    api_key="COHERE_API_KEY", model="command-a-03-2025"
)

message = ChatMessage(role="user", content="What is 2 + 3?")

response = cohere_model.chat([message])
print(response)

Cohere Embeddings with LlamaIndex#

To use Cohere’s embeddings with LlamaIndex create a Cohere Embeddings object with an embedding model from this list and call get_text_embedding.

from llama_index.embeddings.cohere import CohereEmbedding

embed_model = CohereEmbedding(
    api_key="COHERE_API_KEY",
    model_name="embed-english-v3.0",
    input_type="search_document",  # Use search_query for queries, search_document for documents
    max_tokens=8000,
    embedding_types=["float"],
)

embeddings = embed_model.get_text_embedding("Welcome to Cohere!")

# Print embeddings
print(len(embeddings))
print(embeddings[:5])

Cohere Rerank with LlamaIndex#

To use Cohere’s rerank functionality with LlamaIndex create a Cohere Rerank object and use as a node post processor.

from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.readers.web import (
    SimpleWebPageReader,
)  # first, run `pip install llama-index-readers-web`

# create index (we are using an example page from Cohere's docs)
documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["https://docs.cohere.com/v2/docs/cohere-embed"]
)  # you can replace this with any other reader or documents
index = VectorStoreIndex.from_documents(documents=documents)

# create reranker
cohere_rerank = CohereRerank(
    api_key="COHERE_API_KEY", model="rerank-english-v3.0", top_n=2
)

# query the index
query_engine = index.as_query_engine(
    similarity_top_k=10,
    node_postprocessors=[cohere_rerank],
)

print(query_engine)

# generate a response
response = query_engine.query(
    "What is Cohere's Embed Model?",
)

print(response)

# To view the source documents
from llama_index.core.response.pprint_utils import pprint_response

pprint_response(response, show_source=True)

Cohere RAG with LlamaIndex#

The following example uses Cohere’s chat model, embeddings and rerank functionality to generate a response based on your data.

from llama_index.llms.cohere import Cohere
from llama_index.embeddings.cohere import CohereEmbedding
from llama_index.postprocessor.cohere_rerank import CohereRerank
from llama_index.core import Settings
from llama_index.core import VectorStoreIndex
from llama_index.readers.web import (
    SimpleWebPageReader,
)  # first, run `pip install llama-index-readers-web`

# Create the embedding model
embed_model = CohereEmbedding(
    api_key="COHERE_API_KEY",
    model="embed-english-v3.0",
    input_type="search_document",
    max_tokens=8000,
    embedding_types=["float"],
)

# Create the service context with the cohere model for generation and embedding model
Settings.llm = Cohere(
    api_key="COHERE_API_KEY", model="command-a-03-2025"
)
Settings.embed_model = embed_model

# create index (we are using an example page from Cohere's docs)
documents = SimpleWebPageReader(html_to_text=True).load_data(
    ["https://docs.cohere.com/v2/docs/cohere-embed"]
)  # you can replace this with any other reader or documents
index = VectorStoreIndex.from_documents(documents=documents)

# Create a cohere reranker
cohere_rerank = CohereRerank(
    api_key="COHERE_API_KEY", model="rerank-english-v3.0", top_n=2
)

# Create the query engine
query_engine = index.as_query_engine(
    node_postprocessors=[cohere_rerank]
)

# Generate the response
response = query_engine.query("What is Cohere's Embed model?")
print(response)

Cohere Tool Use (Function Calling) with LlamaIndex#

To use Cohere’s tool use functionality with LlamaIndex, you can use the FunctionTool class to create a tool that uses Cohere’s API.

from llama_index.llms.cohere import Cohere
from llama_index.core.tools import FunctionTool
from llama_index.core.agent import FunctionCallingAgent


# Define tools
def multiply(a: int, b: int) -> int:
    """Multiple two integers and returns the result integer"""
    return a * b


multiply_tool = FunctionTool.from_defaults(fn=multiply)


def add(a: int, b: int) -> int:
    """Add two integers and returns the result integer"""
    return a + b


add_tool = FunctionTool.from_defaults(fn=add)

# Define LLM
llm = Cohere(api_key="COHERE_API_KEY", model="command-a-03-2025")

# Create agent
agent = FunctionCallingAgent.from_tools(
    [multiply_tool, add_tool],
    llm=llm,
    verbose=True,
    allow_parallel_tool_calls=True,
)

# Run agent
response = await agent.achat("What is (121 * 3) + (5 * 8)?")

Link last verified June 7, 2026. View original ↗

Source: Cohere Docs

Link last verified: 2026-02-26