Decrease latency ↗

Summary: Learn techniques to decrease latency for search and upsert operations.

Original Documentation

Documentation Index#
Fetch the complete documentation index at: https://docs.pinecone.io/llms.txt Use this file to discover all available pages before exploring further.

Learn techniques to decrease latency for search and upsert operations.

Use namespaces#

When you divide records into namespaces in a logical way, you speed up queries by ensuring only relevant records are scanned. The same applies to fetching records, listing record IDs, and other data operations.

Filter by metadata#

In addition to increasing search accuracy and relevance, searching with metadata filters can also help decrease latency by retrieving only records that match the filter.

Target indexes by host#

When you target an index by name for data operations such as upsert and query, the SDK gets the unique DNS host for the index using the describe_index operation. This is convenient for testing but should be avoided in production because describe_index uses a different API than data operations and therefore adds an additional network call and point of failure. Instead, you should get an index host once and cache it for reuse or specify the host directly.

You can get index hosts in the Pinecone console or using the describe_index operation.

The following example shows how to target an index by host directly:

When using Private Endpoints for private connectivity between your application and Pinecone, you must target the index using the Private Endpoint URL for the host.

from pinecone.grpc import PineconeGRPC as Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

index = pc.Index(host="INDEX_HOST")

import { Pinecone } from '@pinecone-database/pinecone';

const pc = new Pinecone({ apiKey: 'YOUR_API_KEY' });

// For the Node.js SDK, you must specify both the index host and name.
const index = pc.index("INDEX_NAME", "INDEX_HOST");

import io.pinecone.clients.Index;
import io.pinecone.configs.PineconeConfig;
import io.pinecone.configs.PineconeConnection;

public class TargetIndexByHostExample {
    public static void main(String[] args) {
        PineconeConfig config = new PineconeConfig("YOUR_API_KEY");
        config.setHost("INDEX_HOST");
        PineconeConnection connection = new PineconeConnection(config);
        // For the Java SDK, you must specify both the index host and name.
        Index index = new Index(connection, "INDEX_NAME");
    }
}

package main

import (
    "context"
    "fmt"
    "log"

    "github.com/pinecone-io/go-pinecone/v4/pinecone"
)

func main() {
    ctx := context.Background()

    pc, err := pinecone.NewClient(pinecone.NewClientParams{
        ApiKey: "YOUR_API_KEY",
    })
    if err != nil {
        log.Fatalf("Failed to create Client: %v", err)
    }

    idxConnection, err := pc.Index(pinecone.NewIndexConnParams{Host: "INDEX_HOST", Namespace: "example-namespace"})
    if err != nil {
        log.Fatalf("Failed to create IndexConnection for Host %v: %v", idx.Host, err)
    }
}

using Pinecone;

var pinecone = new PineconeClient("YOUR_API_KEY");

var index = pinecone.Index(host: "INDEX_HOST");

Reuse connections#

When you target an index for upserting or querying, the client establishes a TCP connection, which is a three-step process. To avoid going through this process on every request, and reduce average request latency, cache and reuse the index connection object whenever possible.

Use a cloud environment#

If you experience slow uploads or high query latencies, it might be because you are accessing Pinecone from your home network. To decrease latency, access Pinecone/deploy your application from a cloud environment instead, ideally from the same cloud and region as your index.

Avoid including vector values when not needed#

For on-demand indexes, since vector values are retrieved from object storage, including vector values in query responses (include_values=true) adds latency, especially with higher top_k values. If you don’t need the vector values in your response, set include_values=false to improve query performance. This applies to query and fetch operations.

This optimization applies to on-demand indexes. DRN indexes cache values locally and are not affected.

Work with database limits#

Pinecone has rate limits to protect your applications and maintain infrastructure health. Rate limits vary based on pricing plan and apply to serverless indexes only.

Indexes built on Dedicated Read Nodes are not subject to read unit limits for query, fetch, and list operations. For sizing and capacity planning guidance, see the Dedicated Read Nodes guide.

To handle rate limits effectively:

Implement retry logic with exponential backoff.
If you need higher limits for your use case, contact Support. Most limits can be adjusted to accommodate your scaling needs.

Link last verified June 7, 2026. View original ↗

Source: Pinecone Docs

Link last verified: 2026-02-26