Completions API ↗

fireworks guide intermediate prompts ide

Summary: Use the completions API for raw text generation with custom prompt templates

Original Documentation

Documentation Index#
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.

Use the completions API for raw text generation with custom prompt templates

The completions API provides raw text generation without automatic message formatting. Use this when you need full control over prompt formatting or when working with base models.

When to use completions#

Use the completions API for:

Custom prompt templates with specific formatting requirements
Base models (non-instruct/non-chat variants)
Fine-grained control over token-level formatting
Legacy applications that depend on raw completion format

For most use cases, use chat completions instead. Chat completions handles message formatting automatically and works better with instruct-tuned models.

Basic usage#

    from fireworks import Fireworks

    client = Fireworks()

    response = client.completions.create(
      model="accounts/fireworks/models/deepseek-v3p1",
      prompt="Once upon a time"
    )

    print(response.choices[0].text)
    ```
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="Python (OpenAI SDK)"></span>
```python
    import os
    from openai import OpenAI

    client = OpenAI(
        api_key=os.environ.get("FIREWORKS_API_KEY"),
        base_url="https://api.fireworks.ai/inference/v1"
    )

    response = client.completions.create(
        model="accounts/fireworks/models/deepseek-v3p1",
        prompt="Once upon a time"
    )

    print(response.choices[0].text)
    ```
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="JavaScript"></span>
```javascript
    import OpenAI from "openai";

    const client = new OpenAI({
      apiKey: process.env.FIREWORKS_API_KEY,
      baseURL: "https://api.fireworks.ai/inference/v1",
    });

    const response = await client.completions.create({
      model: "accounts/fireworks/models/deepseek-v3p1",
      prompt: "Once upon a time",
    });

    console.log(response.choices[0].text);
    ```
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="curl"></span>
```bash
    curl https://api.fireworks.ai/inference/v1/completions \
      -H "Content-Type: application/json" \
      -H "Authorization: Bearer $FIREWORKS_API_KEY" \
      -d '{
        "model": "accounts/fireworks/models/deepseek-v3p1",
        "prompt": "Once upon a time"
      }'
    ```
  <span class="tab-end"></span>
<span class="tab-group-end"></span>

<span class="callout-start" data-callout-type="note"></span>
  Most models automatically prepend the beginning-of-sequence (BOS) token (e.g., `<s>`) to your prompt. Verify this with the `raw_output` parameter if needed.
<span class="callout-end"></span>

## Custom prompt templates

The completions API is useful when you need to implement custom prompt formats:

```python
# Custom few-shot prompt template
prompt = """Task: Classify the sentiment of the following text.

Text: I love this product!
Sentiment: Positive

Text: This is terrible.
Sentiment: Negative

Text: The weather is nice today.
Sentiment:"""

response = client.completions.create(
    model="accounts/fireworks/models/deepseek-v3p1",
    prompt=prompt,
    max_tokens=10,
    temperature=0
)

print(response.choices[0].text)  # Output: " Positive"

Common parameters#

All chat completions parameters work with completions:

temperature - Control randomness (0-2)
max_tokens - Limit output length
top_p, top_k, min_p - Sampling parameters
stream - Stream responses token-by-token
frequency_penalty, presence_penalty - Reduce repetition

See the API reference for complete parameter documentation.

Querying deployments#

Use completions with on-demand deployments by specifying the deployment identifier:

response = client.completions.create(
    model="accounts/<ACCOUNT_ID>/deployments/<DEPLOYMENT_ID>",
    prompt="Your prompt here"
)

Next steps#

Use chat completions for most use cases

Stream responses for real-time UX

Complete API documentation

Link last verified June 7, 2026. View original ↗

Source: Fireworks AI Docs

Link last verified: 2026-06-07