Chroma BM25 ↗
noOriginal Documentation
Documentation Index#
Fetch the complete documentation index at: https://docs.trychroma.com/llms.txt Use this file to discover all available pages before exploring further.
export const Callout = ({title, children}) =>
{title && <p className="block mb-2"><strong>{title}</strong></p>}
{children}
;
Chroma provides a built-in BM25 sparse embedding function. BM25 (Best Matching 25) is a ranking function used to estimate the relevance of documents to a given search query. This embedding function runs locally and does not require any external API keys.
Sparse embeddings are useful for retrieval tasks where you want to match on specific keywords or terms, rather than semantic similarity.
from chromadb.utils.embedding_functions import ChromaBm25EmbeddingFunction
bm25_ef = ChromaBm25EmbeddingFunction(
k=1.2,
b=0.75,
avg_doc_length=256.0,
token_max_length=40
)
texts = ["Hello, world!", "How are you?"]
sparse_embeddings = bm25_ef(texts)
```
You can customize the BM25 parameters:
* `k`: Controls term frequency saturation (default: 1.2)
* `b`: Controls document length normalization (default: 0.75)
* `avg_doc_length`: Average document length in tokens (default: 256.0)
* `token_max_length`: Maximum token length (default: 40)
* `stopwords`: Optional list of stopwords to exclude
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="TypeScript"></span>
```typescript
// npm install @chroma-core/chroma-bm25
import { ChromaBm25EmbeddingFunction } from "@chroma-core/chroma-bm25";
const embedder = new ChromaBm25EmbeddingFunction({
k: 1.2,
b: 0.75,
avgDocLength: 256.0,
tokenMaxLength: 40,
});
// use directly
const sparseEmbeddings = await embedder.generate(["document1", "document2"]);
```
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="Rust"></span>
Use the built-in BM25 sparse embedding helper, then pass embeddings to Chroma.
```rust
use chroma::embed::bm25::BM25SparseEmbeddingFunction;
let bm25 = BM25SparseEmbeddingFunction::default_murmur3_abs();
let sparse_vector = bm25.encode("document text")?;
```
<span class="tab-end"></span>
<span class="tab-group-end"></span>
<span class="callout-start" data-callout-type="note"></span>
BM25 is a classic information retrieval algorithm that works well for keyword-based search. For semantic search, consider using dense embedding functions instead.
<span class="callout-end"></span>Link last verified
June 7, 2026.
View original ↗
Source: Chroma Docs
Link last verified: 2026-03-04