Full-text search ↗
noOriginal Documentation
Documentation Index#
Fetch the complete documentation index at: https://docs.pinecone.io/llms.txt Use this file to discover all available pages before exploring further.
Keyword-based search over text documents using BM25, phrase matching, and Lucene query syntax.
Full-text search is in early access. APIs may change, and some features are not yet available.
Full-text search (FTS) enables keyword-based search over text documents in Pinecone. It offers two query types: simple phrase matching (type: "text") for matching exact phrases on a field, and Lucene query syntax (type: "query_string") supporting boolean operators, phrase prefix matching, boosting, and more. Results are ranked by relevance using the BM25 algorithm.
Schema definition#
Full-text search indexes require an explicit schema. During early access, the schema at index creation declares only full-text searchable fields — that is, fields with full_text_searchable: true.
When you create the index, the schema tells Pinecone which fields contain text you want to search (full_text_searchable: true). During early access, only these full-text searchable fields may be included.
In an upsert, fields not defined in the schema are stored with the record and are filterable, but they will not appear in the schema.
Here’s an example schema with a single full-text searchable content field (a full create-index request also includes read_capacity and other top-level fields):
{
"name": "articles",
"deployment": {
"deployment_type": "managed",
"cloud": "aws",
"region": "us-east-1"
},
"schema": {
"fields": {
"content": {
"type": "string",
"full_text_searchable": true,
"description": "The main body text of the article"
}
}
}
}Field types (in schema during early access):
stringwithfull_text_searchable: true- Text data matched by keyword queries. This is the only type of field allowed in the schema during early access.
You can still include other fields (e.g., category, year) in your documents at upsert time; they are stored and filterable, but they are not part of the index schema. For example, you can search the content field for “machine learning” and filter on category = "technology" using fields you sent in the document.
Schema migration is not yet supported. Once an index is created, you cannot add, remove, or modify fields. Plan your schema carefully.
API#
Full-text search uses API version 2026-01.alpha. All requests require the header X-Pinecone-API-Version: 2026-01.alpha.
Control plane operations#
Control plane operations are used to manage indexes and their configuration.
During early access, full-text search indexes must be created with dedicated read nodes, using a single b1 shard and replica.
After creation, wait for both status.ready: true and read_capacity.status.state: "Ready" before searching. The index status may show ready before the dedicated read nodes finish provisioning. Searches made while read_capacity.status.state is still "Migrating" will return empty results.
Example request
curl -X POST "https://api.pinecone.io/indexes" \
-H "Api-Key: {{YOUR_API_KEY}}" \
-H "Content-Type: application/json" \
-H "X-Pinecone-API-Version: 2026-01.alpha" \
-d '{
"name": "articles",
"deployment": {
"deployment_type": "managed",
"cloud": "aws",
"region": "us-east-1"
},
"schema": {
"fields": {
"content": {
"type": "string",
"full_text_searchable": true,
"language": "en"
}
}
},
"read_capacity": {
"mode": "Dedicated",
"dedicated": {
"node_type": "b1",
"scaling": "Manual",
"manual": { "shards": 1, "replicas": 1 }
}
},
"deletion_protection": "disabled"
}'
```
Request parameters:
* `name` (string, required) - Unique index name (lowercase alphanumeric and hyphens)
* `deployment` (object, required) - Deployment configuration
* `deployment_type` (string, required) - Must be `"managed"` for serverless
* `cloud` (string, required) - Cloud provider: `"aws"` or `"gcp"`
* `region` (string, required) - Region code (e.g., `"us-east-1"`)
* `schema` (object, required) - Schema definition
* `fields` (object, required) - Map of field names to field definitions. During early access, the schema may only contain **full-text searchable fields**:
* **Full-text searchable field** - Enables full-text search
* `type: "string"` (required)
* `full_text_searchable: true` (required)
* `language` (string, optional) - Language for tokenization and, when [`stemming`](#stemming) is enabled, for stemming (default: `"en"`). Accepts short codes or full names (e.g., `"fr"` or `"french"`). See [Language](#language) for the full list of supported languages.
* `stemming` (boolean, optional) - Whether to enable language-based stemming (default: `false`). See [Stemming](#stemming).
* Full-text searchable fields support an optional `description` for documenting what the field contains. This is especially useful for agentic workflows where an LLM inspects the schema to understand how to query the index.
* `read_capacity` (object, required) - Read capacity configuration.
* `mode` (string, required) - Must be `"Dedicated"` for full-text search indexes
* `dedicated` (object, required) - Dedicated read node configuration
* `node_type` (string, required) - Node type (e.g., `"b1"`)
* `scaling` (string, required) - Scaling mode: `"Manual"`
* `manual` (object, required) - Manual scaling configuration
* `shards` (integer, required) - Number of shards (minimum `1`)
* `replicas` (integer, required) - Number of replicas (minimum `1`)
* `deletion_protection` (string, optional) - `"enabled"` or `"disabled"` (default: `"disabled"`)
* `tags` (object, optional) - Key-value tags for the index
**Schema constraints:**
* During early access, the schema may only contain fields with `full_text_searchable: true` (full-text searchable fields).
* Field names must be unique within the schema
* Field names must contain only alphanumeric characters and underscores
* The schema must contain at least one field
* Only one field can have `full_text_searchable: true` (multiple text fields not yet supported)
**Example response**
**Status:** 202 Accepted
```json
{
"id": "e51ea4e1-2dda-4607-94dc-9054b1fa8492",
"name": "articles",
"host": "articles-jweaq8m.svc.aped-4627-b74a.pinecone.io",
"status": {
"ready": false,
"state": "Initializing"
},
"deployment": {
"deployment_type": "managed",
"cloud": "aws",
"region": "us-east-1",
"environment": "aped-4627-b74a"
},
"schema": {
"version": "v1",
"fields": {
"content": {
"type": "string",
"description": null,
"full_text_searchable": true,
"language": "en",
"stemming": false,
"lowercase": true,
"max_term_len": 40
}
}
},
"read_capacity": {
"mode": "Dedicated",
"dedicated": {
"node_type": "b1",
"scaling": "Manual",
"manual": { "shards": 1, "replicas": 1 }
},
"status": {
"state": "Migrating",
"current_shards": null,
"current_replicas": null
}
},
"tags": null,
"deletion_protection": "disabled"
}
```
Response fields:
* `id` (string) - Unique index ID
* `name` (string) - Index name
* `host` (string) - Index host URL for data plane operations
* `status` (object) - Index status
* `ready` (boolean) - Whether the index is ready for operations
* `state` (string) - Current state: `"Initializing"`, `"Ready"`, etc.
* `deployment` (object) - Deployment configuration
* `deployment_type` (string) - Deployment type (e.g., `"managed"`)
* `cloud` (string) - Cloud provider
* `region` (string) - Region code
* `environment` (string) - Environment identifier assigned by the system
* `schema` (object) - Schema definition
* `version` (string) - Schema version (e.g., `"v1"`)
* `fields` (object) - Field definitions with server-applied defaults. Full-text searchable fields include additional properties: `language`, `stemming` (defaults to `false`; set to `true` at creation to enable), `lowercase`, `max_term_len`. All fields include `description` (null if not set).
* `read_capacity` (object) - Read capacity configuration
* `mode` (string) - Read capacity mode (e.g., `"Dedicated"`)
* `dedicated` (object) - Dedicated read node configuration
* `node_type` (string) - Node type (e.g., `"b1"`)
* `scaling` (string) - Scaling mode (e.g., `"Manual"`)
* `manual` (object) - Manual scaling configuration
* `shards` (integer) - Number of shards
* `replicas` (integer) - Number of replicas
* `status` (object) - Current status of read capacity provisioning
* `state` (string) - Provisioning state (e.g., `"Migrating"`, `"Ready"`)
* `current_shards` (integer or null) - Current number of shards
* `current_replicas` (integer or null) - Current number of replicas
* `tags` (object or null) - Key-value tags, or null if none set
* `deletion_protection` (string) - Deletion protection status
Wait for `status.ready: true` before performing data plane operations.
</Accordion>
<Accordion title="List indexes (GET /indexes)">
Returns all indexes in the project, including their current status and configuration.
**Example request**
```bash
curl -X GET "https://api.pinecone.io/indexes" \
-H "Api-Key: {{YOUR_API_KEY}}" \
-H "X-Pinecone-API-Version: 2026-01.alpha"
```
**Example response**
**Status:** 200 OK
```json
{
"indexes": [
{
"id": "e51ea4e1-2dda-4607-94dc-9054b1fa8492",
"name": "articles",
"host": "articles-jweaq8m.svc.aped-4627-b74a.pinecone.io",
"status": {
"ready": true,
"state": "Ready"
},
"deployment": {
"deployment_type": "managed",
"region": "us-east-1",
"cloud": "aws",
"environment": "aped-4627-b74a"
},
"read_capacity": {
"mode": "Dedicated",
"dedicated": {
"node_type": "b1",
"scaling": "Manual",
"manual": {
"shards": 1,
"replicas": 1
}
},
"status": {
"state": "Ready",
"current_shards": 1,
"current_replicas": 1
}
},
"schema": {
"version": "v1",
"fields": {
"content": {
"type": "string",
"description": null,
"full_text_searchable": true,
"language": "en",
"stemming": false,
"lowercase": true,
"max_term_len": 40
}
}
},
"tags": null,
"deletion_protection": "disabled"
},
// More indexes...
]
}
```
Returns an array of index objects, each with the same structure as the create index response (see above).
</Accordion>
<Accordion title="Describe index (GET /indexes/{index_name})">
Returns detailed information about a specific index, including its schema, status, and host URL.
**Example request**
```bash
curl -X GET "https://api.pinecone.io/indexes/articles" \
-H "Api-Key: {{YOUR_API_KEY}}" \
-H "X-Pinecone-API-Version: 2026-01.alpha"
```
Path parameters:
* `index_name` (string, required) - Name of the index
**Example response**
**Status:** 200 OK
Returns the same structure as the create index response.
</Accordion>
<Accordion title="Update index (PATCH /indexes/{index_name})">
Updates index configuration. Currently, only `deletion_protection` can be updated.
**Example request**
```bash
curl -X PATCH "https://api.pinecone.io/indexes/articles" \
-H "Api-Key: {{YOUR_API_KEY}}" \
-H "Content-Type: application/json" \
-H "X-Pinecone-API-Version: 2026-01.alpha" \
-d '{
"deletion_protection": "enabled"
}'
```
Path parameters:
* `index_name` (string, required) - Name of the index
Body parameters:
* `deletion_protection` (string, optional) - `"enabled"` or `"disabled"`
**Example response**
**Status:** 200 OK
Returns the updated index configuration (same structure as create index response).
</Accordion>
<Accordion title="Delete index (DELETE /indexes/{index_name})">
Permanently deletes an index and all its data. This action cannot be undone. If `deletion_protection` is enabled, you must first disable it using the update endpoint.
**Example request**
```bash
curl -X DELETE "https://api.pinecone.io/indexes/articles" \
-H "Api-Key: {{YOUR_API_KEY}}" \
-H "X-Pinecone-API-Version: 2026-01.alpha"
```
Path parameters:
* `index_name` (string, required) - Name of the index
**Example response**
**Status:** 202 Accepted (empty body)
</Accordion>
</AccordionGroup>
### Data plane operations
<span class="callout-start" data-callout-type="note"></span>
Data plane operations include a namespace in the URL path. Namespaces partition documents within an index: they're auto-created on first upsert and completely isolated from each other. Use `"__default__"` if you don't need partitioning.
<span class="callout-end"></span>
<AccordionGroup>
<Accordion title="Upsert documents (POST /namespaces/{namespace}/documents/upsert)">
Inserts or updates documents. If a document with the same `_id` exists, it is completely replaced. Documents become searchable within approximately one minute.
**Example request**
```bash
curl -X POST "https://articles-abc123.svc.us-east-1.pinecone.io/namespaces/__default__/documents/upsert" \
-H "Api-Key: {{YOUR_API_KEY}}" \
-H "Content-Type: application/json" \
-H "X-Pinecone-API-Version: 2026-01.alpha" \
-d '{
"documents": [
{
"_id": "doc1",
"content": "Machine learning models are revolutionizing natural language processing",
"category": "technology",
"year": 2024
},
{
"_id": "doc2",
"content": "Vector databases enable fast similarity search across embeddings",
"category": "technology",
"year": 2023
},
{
"_id": "doc3",
"content": "Quantum computers leverage superposition for faster computation",
"category": "science",
"year": 2024
}
]
}'
```
Path parameters:
* `namespace` (string, required) - Namespace name (use `"__default__"` if not using namespaces)
Body parameters:
* `documents` (array, required) - Array of documents to upsert. Each document is an object with:
* `_id` (string, required) - Unique document ID. If a document with this `_id` already exists, it is replaced entirely. If multiple documents in the same batch share an `_id`, only the last one is stored.
* Fields matching your schema (the `full_text_searchable` field must be present)
**Example response**
**Status:** 200 OK
```json
{
"upserted_count": 3
}
```
Response fields:
* `upserted_count` (integer) - Number of documents upserted
#### Schema validation
Each item in the `documents` array is validated against your index schema. If any item fails validation, **the entire request fails** and nothing is upserted.
| Scenario | Result |
| ------------------------------------------- | -------------------------------------------------- |
| Field value doesn't match declared type | **Error** - request fails |
| Field not in schema | Stored and filterable, but not added to the schema |
| Schema field missing from item | OK - fields are optional |
| Text-searchable field is missing | **Error** - request fails |
| Text contains Unicode or special characters | OK - fully supported |
**Example errors:**
* `"Document with id 'doc-1': boolean field 'in_stock' must be a boolean"`
* `"Each document must have at least one indexable field"`
</Accordion>
<Accordion title="Search documents (POST /namespaces/{namespace}/documents/search)">
Searches documents using simple phrase matching (`type: "text"`) or Lucene query syntax (`type: "query_string"`). Optionally filter by field values before scoring. Results are ranked by BM25 relevance score.
**Example request**
```bash
curl -X POST "https://articles-abc123.svc.us-east-1.pinecone.io/namespaces/__default__/documents/search" \
-H "Api-Key: {{YOUR_API_KEY}}" \
-H "Content-Type: application/json" \
-H "X-Pinecone-API-Version: 2026-01.alpha" \
-d '{
"include_fields": ["content", "category", "year"],
"score_by": [{
"type": "text",
"field": "content",
"query": "machine learning"
}],
"top_k": 10
}'
```
Path parameters:
* `namespace` (string, required) - Namespace name (use `"__default__"` if not using namespaces)
Body parameters:
* `include_fields` (array, required) - List of field names to return in results
* `score_by` (array, required) - Array of scoring methods. Each item must be one of:
* **`type: "text"`** - Simple phrase matching on a specific field. Your query is matched as an exact phrase: all terms must appear adjacent and in the same order. Case-insensitive.
* `field` (string, required) - Name of the searchable field.
* `query` (string, required) - The phrase to search for. Multiple words are treated as a phrase, not individual terms. Query syntax operators (AND, OR, NOT, etc.) are not supported; they are treated as literal words.
* **`type: "query_string"`** - Lucene query syntax. Supports boolean operators, phrase prefix matching, boosting, and more. Do not specify `field`; instead, target fields within the query itself.
* `query` (string, required) - A Lucene query string in the form `<field_name>:(<query clause>)` (see [query syntax reference](#query-syntax-reference)).
* `top_k` (integer, required) - Number of results to return (1-10000)
* `filter` (object, optional) - Filter conditions on filterable fields (applied before search)
<span class="callout-start" data-callout-type="note"></span>
**Choosing between `type: "text"` and `type: "query_string"`:**
* Use `type: "text"` when you know the exact phrase to look for. It's the simpler option — just provide the phrase and the field name.
* Use `type: "query_string"` when you need boolean operators, phrase prefix matching, boosting, or OR logic between terms. See the [query syntax reference](#query-syntax-reference) for the full list of operators.
<span class="callout-end"></span>
**Filter operators:**
Filters are applied *before* the search runs. The search only considers documents that match the filter.
| Operator | Example | Description |
| -------- | ------------------------------------ | --------------------- |
| `$eq` | `{"category": {"$eq": "tech"}}` | Equals |
| `$ne` | `{"category": {"$ne": "tech"}}` | Not equals |
| `$gt` | `{"year": {"$gt": 2023}}` | Greater than |
| `$gte` | `{"year": {"$gte": 2023}}` | Greater than or equal |
| `$lt` | `{"year": {"$lt": 2025}}` | Less than |
| `$lte` | `{"year": {"$lte": 2025}}` | Less than or equal |
| `$in` | `{"category": {"$in": ["a", "b"]}}` | In list |
| `$nin` | `{"category": {"$nin": ["a", "b"]}}` | Not in list |
**More examples:**
*Simple phrase matching with filter:*
```bash
curl -X POST "https://articles-abc123.svc.us-east-1.pinecone.io/namespaces/__default__/documents/search" \
-H "Api-Key: {{YOUR_API_KEY}}" \
-H "Content-Type: application/json" \
-H "X-Pinecone-API-Version: 2026-01.alpha" \
-d '{
"include_fields": ["content", "category", "year"],
"filter": {
"category": { "$eq": "technology" },
"year": { "$gte": 2024 }
},
"score_by": [{
"type": "text",
"field": "content",
"query": "machine learning"
}],
"top_k": 10
}'
```
This matches documents containing the exact phrase "machine learning" (adjacent, in order) within the `content` field, filtered to technology articles from 2024 onward.
*Boolean query with query\_string:*
```bash
curl -X POST "https://articles-abc123.svc.us-east-1.pinecone.io/namespaces/__default__/documents/search" \
-H "Api-Key: {{YOUR_API_KEY}}" \
-H "Content-Type: application/json" \
-H "X-Pinecone-API-Version: 2026-01.alpha" \
-d '{
"include_fields": ["content", "category"],
"score_by": [{
"type": "query_string",
"query": "content:(machine AND learning)"
}],
"top_k": 10
}'
```
This matches documents where the `content` field contains both "machine" and "learning" (in any order, not necessarily adjacent).
*Boosting with query\_string:*
```bash
curl -X POST "https://articles-abc123.svc.us-east-1.pinecone.io/namespaces/__default__/documents/search" \
-H "Api-Key: {{YOUR_API_KEY}}" \
-H "Content-Type: application/json" \
-H "X-Pinecone-API-Version: 2026-01.alpha" \
-d '{
"include_fields": ["content", "category"],
"score_by": [{
"type": "query_string",
"query": "content:(\"natural language processing\"^2 machine learning)"
}],
"top_k": 10
}'
```
This boosts the phrase "natural language processing" by 2x and also matches "machine" or "learning" (with default OR). Note that operators like `^`, `AND`, and `OR` only work with `type: "query_string"`.
*OR query (default behavior) with query\_string:*
```bash
curl -X POST "https://articles-abc123.svc.us-east-1.pinecone.io/namespaces/__default__/documents/search" \
-H "Api-Key: {{YOUR_API_KEY}}" \
-H "Content-Type: application/json" \
-H "X-Pinecone-API-Version: 2026-01.alpha" \
-d '{
"include_fields": ["content", "category"],
"score_by": [{
"type": "query_string",
"query": "content:(quick brown fox)"
}],
"top_k": 10
}'
```
With `type: "query_string"`, multiple terms default to OR: this matches documents containing "quick", "brown", or "fox" (or any combination). Documents matching more terms rank higher.
**Example response**
**Status:** 200 OK
```json
{
"matches": [
{
"id": "doc1",
"score": 0.8234,
"content": "Machine learning models are revolutionizing natural language processing",
"category": "technology",
"year": 2024
}
],
"namespace": "__default__",
"usage": { "read_units": 1 }
}
```
Response fields:
* `matches` (array) - Array of matching documents
* `id` (string) - Document ID
* `score` (float) - BM25 relevance score (higher is better)
* `{field}` (any) - Requested fields from the document
* `namespace` (string) - Namespace searched
* `usage` (object) - Usage information
* `read_units` (integer) - Read units consumed
</Accordion>
</AccordionGroup>
## Python SDK
<span class="callout-start" data-callout-type="note"></span>
SDK support for full-text search is currently a **work in progress**. The API and SDK interfaces may change before general availability. The examples below show current SDK usage for the operations described in the API section. For requirements and limitations, see [Early access](#early-access).
<span class="callout-end"></span>
<span class="callout-start" data-callout-type="tip"></span>
For a runnable end-to-end example, see this [Google Colab notebook](https://colab.research.google.com/drive/1lsPeNLCJ2ucbYthHYs9WpybW4nAfB8tG), which demonstrates upserting and searching a sample Wikipedia dataset.
<span class="callout-end"></span>
### Installation
To use full-text search with Pinecone's Python SDK, you need to install the `oakcylinder` package — an early access version of the SDK that is in active development.
<span class="callout-start" data-callout-type="warning"></span>
* The SDK's API may change before these features are merged into the main `pinecone` package.
* Do not install `oakcylinder` alongside `pinecone` in the same Python environment, as namespace conflicts will cause unpredictable behavior.
<span class="callout-end"></span>
```sh
pip install oakcylinderControl plane#
pc = Pinecone(
api_key=os.environ.get('PINECONE_API_KEY')
)
```
from pinecone import SchemaBuilder
# The schema builder is an optional util to help with constructing
# your schema dict in the correct shape
schema = (
SchemaBuilder()
.add_string_field(
name="content",
full_text_searchable=True,
language="en"
)
.build()
)
index_model = pc.indexes.create(
name="articles",
schema=schema,
read_capacity={
"mode": "Dedicated",
"dedicated": {
"node_type": "b1",
"scaling": "Manual",
"manual": {
"shards": 1,
"replicas": 1
},
},
},
)
# Use the host from the response for data plane operations
host = index_model.host
```
</Accordion>
<Accordion title="Describe index">
```python
index_model = pc.indexes.describe(name="articles")
print(index_model.status, index_model.schema)
```
</Accordion>
<Accordion title="Delete index">
```python
pc.indexes.delete(name="articles")
```
</Accordion>
</AccordionGroup>
### Data plane
<AccordionGroup>
<Accordion title="Build a data plane client">
Build a data plane client from the index host (or by name):
```python
index = pc.index(host=index_model.host)
```
</Accordion>
<Accordion title="Upsert documents">
```python
NAMESPACE = 'example-namespace'
docs = [
{"_id": "doc1", "content": "Machine learning models are revolutionizing natural language processing", "category": "technology", "year": 2024},
{"_id": "doc2", "content": "Vector databases enable fast similarity search across embeddings", "category": "technology", "year": 2023},
{"_id": "doc3", "content": "Quantum computers leverage superposition for faster computation", "category": "science", "year": 2024},
# ... more documents
]
index.documents.batch_upsert(
namespace=NAMESPACE,
documents=docs,
batch_size=50,
max_workers=4,
show_progress=True,
)
```
</Accordion>
<Accordion title="Search documents — simple phrase (type: text)">
```python
NAMESPACE = 'example-namespace'
response = index.documents.search(
namespace=NAMESPACE,
top_k=10,
score_by=[{"type": "text", "field": "content", "query": "machine learning"}],
include_fields=["content", "category", "year"],
)
for match in response.matches:
print(match.id, match.score, getattr(match, "content", ""))
```
</Accordion>
<Accordion title="Search documents — Lucene query string (type: query_string)">
```python
NAMESPACE = 'example-namespace'
response = index.documents.search(
namespace=NAMESPACE,
top_k=10,
score_by=[{"type": "query_string", "query": "content:(machine AND learning)"}],
include_fields=["content", "category", "year"],
)
```
</Accordion>
</AccordionGroup>
## Query syntax reference
Full-text search supports two query types with different capabilities:
| Feature | `type: "text"` | `type: "query_string"` |
| ----------------------- | ------------------------------------- | ------------------------------------- |
| **Purpose** | Simple phrase matching | Lucene query syntax |
| **`field` parameter** | Required | Not allowed (field names go in query) |
| **Multi-word behavior** | Phrase (adjacent, in order) | OR by default |
| **Boolean operators** | Not supported (treated as words) | `AND`, `OR`, `NOT`, `+`, `-` |
| **Phrase prefix** | Not supported | `"phrase pre"*` (last term as prefix) |
| **Phrase matching** | Automatic (entire query is a phrase) | Wrap in quotes: `"exact phrase"` |
| **Phrase slop** | Not supported | `"phrase"~N` |
| **Boosting** | Not supported | `term^N` |
| **Stemming** | Supported ([when enabled](#stemming)) | Supported ([when enabled](#stemming)) |
| **Case sensitivity** | Case-insensitive | Case-insensitive |
### Simple phrase matching (`type: "text"`)
With `type: "text"`, your entire query is treated as a phrase. All terms must appear **adjacent and in the same order** in the document. Matching is case-insensitive.
| Query | Matches | Does not match |
| ------------------ | ----------------------------------- | ------------------------------------------------------ |
| `machine learning` | "**Machine learning** is great" | "Machine and learning separately" (words not adjacent) |
| `machine learning` | "We use **machine learning** daily" | "Learning machine" (wrong order) |
| `machine` | "**Machine** learning is great" | "Vector databases only" |
**Key behaviors:**
* **Single term** (`machine`): Matches any document containing the term. Case-insensitive.
* **Multiple terms** (`machine learning`): Matched as a **phrase** — all terms must appear adjacent and in order. This is not an OR query.
* **Tokenization**: Text is split into tokens on whitespace and punctuation. This means punctuation between words does not prevent a phrase match: a document containing "state-of-the-art" is tokenized as `["state", "of", "the", "art"]`, and a phrase query for `state of the art` matches it because the tokens are adjacent and in the correct order.
* **No stop words**: Common words like "the", "a", "of", and "is" are not removed during indexing or search. All tokens are indexed and searchable. This means phrase queries are position-sensitive: `"state art"` does not match "state-of-the-art" because "of" and "the" sit between "state" and "art". To exclude specific words or require non-adjacent terms, use `type: "query_string"` with operators like `NOT`, `-`, or `AND` (e.g., `content:(state AND art)`).
* **No operator support**: Characters like `AND`, `OR`, `NOT`, `*`, `~`, `^`, `+`, `-`, and quotes are treated as literal text, not operators. For example, `machine AND learning` searches for the three-word phrase "machine and learning".
<span class="callout-start" data-callout-type="note"></span>
If you need boolean logic, phrase prefix matching, boosting, or any other query operators, use `type: "query_string"` instead.
<span class="callout-end"></span>
### Lucene query syntax (`type: "query_string"`)
With `type: "query_string"`, you write Lucene query syntax, with operator support. Field names are embedded in the query itself (e.g., `content:(term)`).
| Operator | Syntax | Example | Description |
| -------------- | ----------------- | -------------------------------- | ---------------------------------------- |
| Term | `field:(word)` | `content:(computers)` | Match documents containing term |
| Multiple terms | `field:(a b)` | `content:(machine learning)` | OR by default — matches either term |
| Phrase | `field:("words")` | `content:("machine learning")` | Exact phrase match (adjacent, in order) |
| AND | `AND` | `content:(a AND b)` | Both terms required |
| OR | `OR` | `content:(a OR b)` | Either term matches (same as default) |
| NOT | `NOT` | `content:(a NOT b)` | Exclude second term |
| Required | `+term` | `content:(+database search)` | Term must be present |
| Excluded | `-term` | `content:(database -deprecated)` | Term must not be present |
| Grouping | `(expr)` | `content:((a OR b) AND c)` | Control precedence |
| Phrase slop | `"phrase"~N` | `content:("fast search"~2)` | Allow up to N words between phrase terms |
| Boost | `term^N` | `content:(machine^3 learning)` | Multiply term's relevance score by N |
| Phrase prefix | `"phrase pre"*` | `content:("james w"*)` | Last term in phrase matched as prefix |
<AccordionGroup>
<Accordion title="Terms and default OR behavior">
A **term** is a single word. Multiple space-separated terms use **OR logic** by default.content:(machine learning)
```
Matches documents containing “machine” OR “learning” (or both). Documents with both terms rank higher.
content:("machine learning")
```
Matches only documents containing the exact phrase "machine learning" with the words adjacent. This is equivalent to what `type: "text"` does with `query: "machine learning"`.
</Accordion>
<Accordion title="Boolean operators (AND, OR, NOT)">
Use `AND`, `OR`, and `NOT` for explicit boolean logic.content:(machine AND learning) # Both terms required (any order)
content:(machine OR learning) # Either term (same as default)
content:(machine NOT learning) # "machine" but not "learning"
```
Precedence: AND binds tighter than OR. Use parentheses to control order:
content:((database OR storage) AND distributed)
```
</Accordion>
<Accordion title="Required and excluded terms (+, -)">
Use `+` to require a term and `-` to exclude a term.content:(+database distributed) # MUST contain "database", "distributed" optional
content:(database -deprecated) # Contains "database", must NOT contain "deprecated"
content:(+vector +search -legacy) # MUST have "vector" AND "search", must NOT have "legacy"
```
This is useful when you want some terms to be mandatory filters while others boost relevance.
content:("machine learning"~3)
```
Matches "machine learning", "machine deep learning", or "machine-assisted learning" (words within 3 positions).
</Accordion>
<Accordion title="Term boosting">
Increase the importance of specific terms in ranking using `^N`.content:(machine^3 learning) # "machine" weighted 3x more than "learning"
content:("neural network"^2 deep) # Phrase boosted 2x
```
Documents with boosted terms rank higher when those terms appear.
content:("james w"*) # Matches "james webb", "james watson", "james wilde"
content:("machine lea"*) # Matches "machine learning", "machine learns"
```
All terms before the last must match exactly and adjacently; only the final term is treated as a prefix. This is useful for autocomplete or search-as-you-type scenarios.
<span class="callout-start" data-callout-type="note"></span>
Single-term prefix wildcards like `auto*` are not supported. The phrase must contain at least two terms.
<span class="callout-end"></span>
</Accordion>
<Accordion title="Combining operators">
Operators can be combined for complex queries:content:(+database (distributed OR replicated) -deprecated)
```
Requires “database”, boosts results containing “distributed” or “replicated”, excludes “deprecated”.
content:("machine learning"^2 AND (tensorflow OR pytorch) -keras)
```
Boost exact phrase "machine learning", require a framework, exclude keras.
</Accordion>
</AccordionGroup>
## Stemming
Stemming reduces words to their root form so that morphological variants match each other. For example, with stemming enabled, a query for "run" also matches documents containing "running" or "runs".
Stemming is **opt-in** and disabled by default. To enable it, set `stemming: true` on a full-text searchable field when creating the index. The stemming algorithm is determined by the field's [`language`](#language) setting.
**Example: enabling stemming with French**
```json
{
"schema": {
"fields": {
"content": {
"type": "string",
"full_text_searchable": true,
"stemming": true,
"language": "french"
}
}
}
}With this configuration, French stemming rules are applied during both indexing and search. A query for “manger” would match documents containing “mangeons”, “mangé”, or “mangeait”.
Stemming applies to both type: "text" and type: "query_string" queries on the field.
With stemming disabled (default):
| Query | Matches | Does not match |
|---|---|---|
run | “run” | “running”, “runs”, “ran” |
machines | “machines” | “machine” |
With stemming enabled:
| Query | Matches | Does not match |
|---|---|---|
run | “run”, “running”, “runs” | “ran” (irregular form) |
machines | “machines”, “machine” | “database” |
Stemming uses algorithmic suffix analysis, so irregular forms (e.g., “ran” for “run”) may not match. Only regular morphological variants (e.g., “running”, “runs”) are reliably stemmed.
Stemming is set at index creation and cannot be changed afterward. If you need to enable or disable stemming, you must create a new index.
Language#
The language parameter controls tokenization and stemming behavior for a full-text searchable field. It determines how text is analyzed during indexing and search: how words are split into tokens and, when stemming is enabled, which language-specific rules are used to reduce words to their root forms.
The default language is "en" (English). You can specify a language using either its short code or full name (e.g., "fr" or "french"). See the Stemming section for an example using "language": "french".
Supported languages:
| Code | Full name |
|---|---|
ar | arabic |
da | danish |
de | german |
el | greek |
en | english |
es | spanish |
fi | finnish |
fr | french |
hu | hungarian |
it | italian |
nl | dutch |
no | norwegian |
pt | portuguese |
ro | romanian |
ru | russian |
sv | swedish |
ta | tamil |
tr | turkish |
Language is set at index creation and cannot be changed afterward.
Troubleshooting#
Unmatched quotes (
"machine learning): Close all quotes.Empty query: Provide at least one search term.
Invalid boolean syntax (
AND machine): Operators need terms on both sides.Unbalanced parentheses: Match all opening and closing parens.
Unknown field name: Field names in the query must match
full_text_searchablefields in the schema.
401 Unauthorized: Check the Api-Key header.
400 Bad Request: Check JSON syntax and required fields.
404 Not Found: Verify the index name and host URL.
Missing API version: Add X-Pinecone-API-Version: 2026-01.alpha.
Type mismatch: Ensure values match declared schema types.
Missing text content: The text-searchable field must be present in the document.
Invalid _id: Every document must have a non-empty _id string.
Reduce query complexity: Boolean operators are more expensive than simple term queries.
Simplify filters: Filters are applied before search, so broad filters increase the search space.
Check document count and size: Larger datasets may have higher latency.
Early access#
Full-text search is in early access under API version 2026-01.alpha. The feature is functional and ready for evaluation, but APIs may evolve based on feedback before general availability.
Requirements & limitations
- All requests require
X-Pinecone-API-Version: 2026-01.alpha - REST API & Python SDK only
- FTS requires a dedicated index created with the
2026-01.alphaAPI - During early access, full-text search indexes must be created using dedicated read nodes (
read_capacity.mode: "Dedicated"), using a singleb1node - Max document size: ~500 KB
- Insert-to-searchable latency: < 1 minute
- One text-searchable field per index
- No document fetch or delete endpoints yet
- No partial updates (upsert replaces the entire document)
- Text search only in this API version (
2026-01.alpha) - Hybrid search and text pre-filtering not yet available
- Indexes cannot be created in CMEK-enabled projects
- Backup and restore not supported
- Fuzzy matching and regex search not yet supported
- Single-term prefix wildcards (
auto*) not supported; use phrase prefix ("word auto"*) instead
Using text and vector search together
Until hybrid search is available, you can use both by maintaining separate indexes—create an FTS index for keyword search, keep your vector index for semantic search, and merge results in your application.
Pricing#
Pricing will be announced before general availability.