Test Pinecone at scale

no
Summary: Test Pinecone with a real-world dataset and semantic search workload.

Original Documentation

Documentation Index#

Fetch the complete documentation index at: https://docs.pinecone.io/llms.txt Use this file to discover all available pages before exploring further.

Test Pinecone with a real-world dataset and semantic search workload.

This guide walks you through testing Pinecone at production scale. You’ll import 10 million vectors, run a benchmark, and analyze the results to verify Pinecone meets production requirements for semantic search applications.

This test requires a Pinecone account on the Standard or Enterprise plan. New users can sign up for the Standard trial for 21 days and $300 in credits, more than enough to cover the costs of this test. Existing users on the Starter plan can upgrade.

About this test#

Semantic search enables finding relevant content based on meaning rather than exact keyword matches, making it ideal for applications like product search, content recommendation, and question-answering systems. This test simulates a production-scale semantic search workload, measuring import time, query throughput, query latency, and associated costs.

The test uses the following configuration:

  • Records: 10 million records from the Amazon Reviews 2023 dataset
  • Embedding model: llama-text-embed-v2 (1024 dimensions)
  • Similarity metric: cosine
  • Total size: 48.8 GB
  • Query load: 10 queries per second total (across all users)
  • Concurrent users: 10 users querying simultaneously
  • Test queries: 100,000 queries
  • Import time target: < 30 minutes
  • Query latency target: p90 latency < 100ms

Estimated cost: ~$127 (import: $48.80, queries: $78.08, storage: $0.09) — see detailed cost breakdown

1. Get an API key#

To follow the steps in this guide, you’ll need an API key. Create a new API key in the Pinecone console, or use this widget:

Your generated API key:

"{{YOUR_API_KEY}}"

2. Create an index#

This test requires you to use AWS-based indexes and infrastructure. The sample dataset is only available from Amazon S3, and you can only import from Amazon S3 to Pinecone indexes hosted on AWS. To run the benchmark, you’ll need to provision an AWS EC2 instance in the same region as your index.

Create an on-demand index that matches the dimensions and similarity metric of the dataset you’ll import in later steps.

  1. In the Pinecone console, go to the Indexes page.
  2. Click Create index.
  3. Check Custom settings.
  4. Configure the index with the following settings:
    • Name: search-10m
    • Vector type: Dense
    • Dimensions: 1024
    • Metric: cosine
    • Capacity mode: Serverless (on-demand)
    • Cloud: AWS (required for this test)
    • Region: Use an AWS region appropriate for your use case (for example, us-east-1)
  5. Click Create index.

If using code to create an index, first install the Python SDK:

    pip install pinecone
    ```

Then, create the index:


  ```python
      from pinecone import Pinecone, ServerlessSpec

      pc = Pinecone(api_key="{{YOUR_API_KEY}}")

      index_name = "search-10m"

      if not pc.has_index(index_name):
          pc.create_index(
              name=index_name,
              vector_type="dense",
              dimension=1024,
              metric="cosine",
              spec=ServerlessSpec(
                  # AWS is required for this test
                  cloud="aws",   
                  # Use an AWS region appropriate for your use case
                  region="us-east-1" 
              )
          )
      ```

  <span class="tab-end"></span>
<span class="tab-group-end"></span>

## 3. Import the dataset

Pinecone's [import feature](/guides/index-data/import-data) enables you to load millions of vectors from object storage in parallel. In this step, you'll import 10 million records into a single namespace (`ns_2`) in your index.

### Choose an import source

To import the dataset, you'll need to use the following Amazon S3 import URL:

s3://fe-customer-pocs/search/search_10M/dense/


### Start and monitor the import

For this dataset, the import should take less than 30 minutes.

<span class="tab-group-start"></span>
  <span class="tab-start" data-tab-title="Console"></span>
1. In the Pinecone console, go to the [Indexes](https://app.pinecone.io/organizations/-/projects/-/indexes) page.
2. Find your `search-10m` index and click **... > Import data**.
3. For **Storage integration**, select **No integration (public bucket)**.
4. Enter the import URL: `s3://fe-customer-pocs/search/search_10M/dense/`.
5. For **Error handling**, select **Abort on error (default)**.
6. Click **Start import**.

To monitor progress, open your index in the Pinecone console and navigate to the **Imports** tab. After the import completes, compare the **Started time** and **End time** timestamps to see the total time required.

<span class="callout-start" data-callout-type="note"></span>
  For this dataset, the import should take around 30 minutes. While the import is running, you can continue with the next step to provision a VM and install VSB. However, wait for the import to complete before running the benchmark.
<span class="callout-end"></span>
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="Code"></span>
Start the import:


  ```python
      from pinecone import Pinecone, ImportErrorMode

      pc = Pinecone(api_key="{{YOUR_API_KEY}}")
      index = pc.Index("search-10m")

      import_response = index.start_import(
          uri="s3://fe-customer-pocs/search/search_10M/dense/",
          error_mode=ImportErrorMode.ABORT 
      )
      print(f"Import started: {import_response['id']}")
      ```


Monitor import progress:


  ```python
      from pinecone import Pinecone
      import time 

      pc = Pinecone(api_key="{{YOUR_API_KEY}}")
      index = pc.Index("search-10m")

      while True:
          status = index.describe_import(id="IMPORT_ID")
          print(f"Status: {status['status']}, Progress: {status['percent_complete']:.1f}%")
          if status['status'] == "Completed":
              print("Import completed successfully!")
              break
          elif status['status'] == "Failed":
              print("Import failed. Check error details.")
              break
          time.sleep(15) # Check every 15 seconds
      ```


<span class="callout-start" data-callout-type="note"></span>
  There are three ways to find the import ID:

  * It's returned when the import is started
  * In the Pinecone console, on the **Imports** tab for your index
  * By calling [List imports](/reference/api/latest/data-plane/list_imports)
<span class="callout-end"></span>
  <span class="tab-end"></span>
<span class="tab-group-end"></span>

## 4. Run the benchmark

To simulate realistic query patterns and measure latency and throughput for your Pinecone index, use [Vector Search Bench (VSB)](https://github.com/pinecone-io/VSB). The benchmark runs 100,000 queries at 10 queries per second, which should take just under three hours to complete.

<span class="steps-start"></span>
  <span class="step-marker" data-step-title="Provision a VM"></span>
VSB reports latency as the time from when the tool issues a query to when the query is returned by Pinecone.

To minimize the client-side latency between the tool and Pinecone, run the benchmark on a dedicated AWS EC2 instance that's hosted in the same AWS region as your Pinecone index. This reduces the client-side latency to sub-millisecond range.

<span class="callout-start" data-callout-type="note"></span>
  As noted in [section 2](#2-create-an-index), this test requires an AWS EC2 instance in the same region as your index.
<span class="callout-end"></span>

For instructions on how to provision an EC2 instance, see the [AWS documentation](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/LaunchingAndUsingInstances.html).

<span class="callout-start" data-callout-type="note"></span>
  Create a VM that comes with Python 3.11 or higher.
<span class="callout-end"></span>
  <span class="step-end"></span>

  <span class="step-marker" data-step-title="Connect to the VM"></span>
Connect to the VM using SSH or the cloud provider's console.
  <span class="step-end"></span>

  <span class="step-marker" data-step-title="Install Vector Search Bench (VSB)"></span>
[VSB (Vector Search Bench)](https://github.com/pinecone-io/VSB) is a benchmarking suite for testing vector database search performance across different workloads and databases. To install it, you'll first need to install various dependencies.

1. **Verify Python version**

   VSB requires Python 3.11 or higher to run. Verify your Python version:

   ```bash
       python3 --version
       ```

   If your version is below 3.11, install Python 3.11+ using your distribution's package manager.

2. **Install git**

   Git is required to clone the VSB repository. Check if git is installed:

   ```bash
       git --version
       ```

   If git is not installed, install it using your system's package manager:

   ```bash
       # Adapt for your VM's package manager (apt/yum/dnf)
       sudo apt-get update && sudo apt-get install git
       ```

3. **Install pipx**

   pipx is required to install Poetry. First, check if pip3 is installed:

   ```bash
       pip3 --version
       ```

   If pip is not installed, install it using your system's package manager:

   ```bash
       # Adapt for your VM's package manager (apt/yum/dnf)
       sudo apt-get update && sudo apt-get install python3-pip
       ```

   Then check if pipx is installed:

   ```bash
       pipx --version
       ```

   If pipx is not installed, install it via your system's package manager:

   ```bash
       # Adapt for your VM's package manager (apt/yum/dnf)
       sudo apt-get update && sudo apt-get install pipx
       pipx ensurepath
       ```

   After installation, run this command to update the PATH in your current terminal session:

   ```bash
       source ~/.bashrc
       ```

4. **Install Poetry**

   [Poetry](https://python-poetry.org/) is required to manage [VSB's](https://github.com/pinecone-io/VSB) Python dependencies and virtual environment. If Poetry is not [installed](https://python-poetry.org/docs/), use pipx to install it:

   ```bash
       pipx install poetry
       ```

   Alternatively, use the [official Poetry installer](https://python-poetry.org/docs/#installing-with-the-official-installer).

5. **Clone the VSB repository**

   To run the benchmark, you'll first need to clone the VSB repository and navigate to it:

   ```bash
       git clone https://github.com/pinecone-io/VSB.git
       cd VSB
       ```

6. **Configure Poetry**

   Since your VM has Python 3.11 or higher installed (as specified in the VM provisioning step), tell Poetry to use it:

   ```bash
       poetry env use python3
       ```

7. **Install dependencies**

   VSB requires several Python packages to run. Install all dependencies:

   ```bash
       poetry install
       ```
  <span class="step-end"></span>

  <span class="step-marker" data-step-title="Benchmark your Pinecone index"></span>
To test the performance of your Pinecone index, run the following command from within the `VSB` directory. For more information about VSB, see its [GitHub repository](https://github.com/pinecone-io/VSB).

The following command simulates 10 concurrent users issuing a total of 100,000 queries at 10 queries per second (QPS). Each query performs a vector search for the top 10 most similar 1024-dimensional vectors, using cosine similarity, with query vectors selected uniformly at random. The `--skip_populate` flag skips the data population phase, since you've already imported data into your index.

```bash
    poetry run vsb \
        --database="pinecone" \
        --workload=synthetic-proportional \
        --pinecone_api_key="{{YOUR_API_KEY}}" \
        --pinecone_index_name="search-10m" \
        --pinecone_namespace_name="ns_2" \
        --synthetic_dimensions=1024 \
        --synthetic_metric=cosine \
        --synthetic_top_k=10 \
        --synthetic_requests=100000 \
        --users=10 \
        --requests_per_sec=10 \
        --synthetic_query_distribution=uniform \
        --synthetic_query_ratio=1 \
        --synthetic_insert_ratio=0 \
        --synthetic_delete_ratio=0 \
        --synthetic_update_ratio=0 \
        --skip_populate
    ```
  <span class="step-end"></span>
<span class="steps-end"></span>

## 5. Analyze performance

At the end of the run, VSB prints an operation summary including the requests per second achieved and latencies at different percentiles. Here's an example output:

```shell
2025-12-23T00:34:37 INFO     Completed Run phase, took 9940.14s

                      Operation Summary

  Operation  Requests  Failures  Requests/sec  Failures/sec
 ───────────────────────────────────────────────────────────
  Search        99000     0(0%)            10           0.0

                                                    Metrics Summary

  Operation  Metric         Min  0.1%    1%    5%   10%   25%   50%   75%   90%   95%   99%  99.9%  99.99%   Max  Mean
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  Search     Latency (ms)    23    25    25    26    27    27    29    34    44    81   350    430    1300  4602    43

Confirm that the requests per second achieved is around 10 QPS and the p90 latency is less than 100ms.

To see more detailed statistics, you can analyze the stats.json file identified in the output.

6. Check costs#

You can check the costs for the import, queries, and storage in the Pinecone console at Settings > Usage. Cost data is delayed up to three days, but once it’s available, compare the actual costs to the estimated costs below.

For the latest pricing details, see Pricing.

Cost typeAmountPricingEstimated cost
Import48.8 GB$1/GB$48.80
Queries100,000 queries$16 per 1M read units$78.08
Storage4 hours$0.33/GB/month$0.09
Total$126.97

The current price for import is $1/GB. The dataset size for this test is 48.8 GB, so the import cost should be $48.80.

A query uses 1 read unit (RU) for every 1 GB of namespace size. The current price for queries in the us-east-1 region of AWS is $16 per 1 million read units (pricing varies by region).

This test ran 100,000 queries against a namespace size of 48.8 GB. Each query uses 48.8 RUs (1 RU per GB), so the total is 4,880,000 RUs. At $16 per 1 million RUs, the cost is (4,880,000 / 1,000,000) × $16 = $78.08.

The current price for storage is $0.33 per GB per month. The dataset size for this test is 48.8 GB. Assuming a total storage time of 4 hours (including import, benchmark runtime, and cleanup), the storage cost is: $0.33/GB/month * 48.8 GB / 730 hours * 4 hours = $0.09.

The total cost for the test is the sum of the import cost, query cost, and storage cost: $48.80 + $78.08 + $0.09 = $126.97.

7. Clean up#

When you no longer need your test index, delete it to avoid incurring unnecessary costs.

Link last verified June 7, 2026. View original ↗
Source: Pinecone Docs
Link last verified: 2026-02-26