Minimize latencies ↗

Original Documentation

Documentation Index#
Fetch the complete documentation index at: https://docs.pinecone.io/llms.txt Use this file to discover all available pages before exploring further.

There are many aspects to consider to minimize latencies:

Slow uploads or high latencies#

To minimize latency when accessing Pinecone:

Switch to a cloud environment. For example: EC2, GCE, Google Colab, GCP AI Platform Notebook, or SageMaker Notebook. If you experience slow uploads or high query latencies, it might be because you are accessing Pinecone from your home network.
Consider deploying your application in the same environment as your Pinecone service.
See Decrease latency for more tips.

High query latencies with batching#

If you’re batching queries, try reducing the number of queries per call to 1 query vector. You can make these calls in parallel and expect roughly the same performance as with batching.

High latencies with fetch or include_values#

For on-demand indexes, since vector values are retrieved from object storage, operations that return vector values (fetch operations or queries with include_values=true) may have increased latency. If you don’t need the vector values, set include_values=false when querying, or use the query operation instead of fetch if you only need metadata or IDs. See Decrease latency for more details.

Link last verified June 7, 2026. View original ↗

Source: Pinecone Docs

Link last verified: 2026-03-04