Custom Models ↗

fireworks guide intermediate models deployment

Summary: Upload, verify, and deploy your own models from Hugging Face or elsewhere

Original Documentation

Documentation Index#
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.

Upload, verify, and deploy your own models from Hugging Face or elsewhere

Upload your own models from Hugging Face or elsewhere to deploy fine-tuned or custom-trained models optimized for your use case.

Multiple upload options – Upload from local files or directly from S3 buckets or Azure Blob Storage
Secure uploads – All uploads are encrypted and models remain private to your account by default

Requirements#

Supported architectures#

Fireworks supports most popular model architectures, including:

- [DBRX](https://huggingface.co/docs/transformers/en/model_doc/dbrx) - [DeepSeek V1, V2 & V3](https://huggingface.co/deepseek-ai) - [Falcon](https://huggingface.co/docs/transformers/en/model_doc/falcon) - [Gemma](https://huggingface.co/docs/transformers/en/model_doc/gemma) - [GPT NeoX](https://huggingface.co/docs/transformers/en/model_doc/gpt_neox) - [Idefics3](https://huggingface.co/docs/transformers/en/model_doc/idefics3) - [Llama 1, 2, 3, 3.1, 4](https://huggingface.co/docs/transformers/en/model_doc/llama2) - [LLaVA](https://huggingface.co/docs/transformers/main/en/model_doc/llava) - [Mistral](https://huggingface.co/docs/transformers/en/model_doc/mistral) & [Mixtral](https://huggingface.co/docs/transformers/en/model_doc/mixtral) - [Phi, Phi-3, Phi-3V, Phi-4](https://huggingface.co/docs/transformers/en/model_doc/phi) - [Pythia](https://huggingface.co/docs/transformers/en/model_doc/gpt_neox) - [Qwen](https://huggingface.co/docs/transformers/en/model_doc/qwen), [Qwen2](https://huggingface.co/docs/transformers/en/model_doc/qwen2), [Qwen2.5](https://huggingface.co/collections/Qwen/qwen25-66e81a666513e518adb90d9e), [Qwen2.5-VL](https://huggingface.co/collections/Qwen/qwen25-vl-6795ffac22b334a837c0f9a5), [Qwen3](https://huggingface.co/Qwen) - [Solar](https://huggingface.co/upstage/SOLAR-10.7B-v1.0) - [StableLM](https://huggingface.co/docs/transformers/main/en/model_doc/stablelm) - [Starcoder (GPTBigCode)](https://huggingface.co/docs/transformers/en/model_doc/gpt_bigcode) & [Starcoder2](https://huggingface.co/docs/transformers/main/en/model_doc/starcoder2) - [Vision Llama](https://huggingface.co/docs/transformers/en/model_doc/llama2)

Required files#

You’ll need standard Hugging Face model files: config.json, model weights (.safetensors or .bin), and tokenizer files.

The model files you will need to provide depend on the model architecture. In general, you will need:

Model configuration: config.json

Fireworks does not support the quantization_config option in config.json.

Model weights in one of the following formats:
*.safetensors
*.bin
- Weights index: *.index.json
- Tokenizer file(s), e.g.:
tokenizer.model
tokenizer.json
tokenizer_config.json
If the requisite files are not present, model deployment may fail.

Customizing base model configuration#

For base models (not LoRA adapters), you can customize the chat template and generation defaults by modifying the standard Hugging Face configuration files:

Chat template: Add or modify the chat_template field in tokenizer_config.json. See the Hugging Face guide on Templates for Chat Models for details.
Generation defaults: Modify generation_config.json to set default generation parameters like max_new_tokens, temperature, top_p, etc.

You can also use a fireworks.json file with base models. If present, fireworks.json takes priority over generation_config.json. See Customizing generation defaults with fireworks.json for the full fireworks.json schema.

For LoRA adapters, you must use fireworks.json to customize generation defaults. Modifying generation_config.json in the adapter folder won’t work because adapters inherit these settings from their base model.

Uploading your model#

For larger models, you can upload directly from cloud storage (S3 or Azure Blob Storage) for faster transfer instead of uploading from your local machine.

Upload from your local machine:

    firectl model create <MODEL_ID> /path/to/files/
    ```
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="S3 bucket (CLI)"></span>
Upload directly from an Amazon S3 bucket:

```bash
    firectl model create <MODEL_ID> s3://<BUCKET_NAME>/<PATH_TO_MODEL>/ \
      --aws-access-key-id <ACCESS_KEY_ID> \
      --aws-secret-access-key <SECRET_ACCESS_KEY>
    ```

See the [AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id-credentials-access-keys-update.html) for how to generate an access key ID and secret access key pair.

<span class="callout-start" data-callout-type="note"></span>
  Ensure the IAM user has read access to the S3 bucket containing the model.
<span class="callout-end"></span>
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="Azure Blob Storage (CLI)"></span>
Upload directly from Azure Blob Storage using either SAS token or federated identity authentication.

<span class="tab-group-start"></span>
  <span class="tab-start" data-tab-title="SAS token"></span>
    First, create a Fireworks secret containing your Azure SAS token:

    ```bash
        firectl secret create --name <SECRET_NAME> --value <SAS_TOKEN>
        ```

    Then, upload the model using the secret:

    ```bash
        firectl model create <MODEL_ID> https://<STORAGE_ACCOUNT>.blob.core.windows.net/<CONTAINER>/<PATH> \
          --azure-sas-token-secret accounts/<ACCOUNT_ID>/secrets/<SECRET_NAME>
        ```

    See the [Azure documentation](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview) for how to generate a SAS token.

    <span class="callout-start" data-callout-type="note"></span>
      Ensure the SAS token has read access to the Azure Blob Storage container containing the model.
    <span class="callout-end"></span>
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="Federated identity"></span>
    Use Azure AD federated identity for passwordless authentication.

    **Setup:** Register an Azure AD application and add a federated credential with:

    * **Issuer:** `https://accounts.google.com`
    * **Subject:** `114308823136673488563`
    * **Audience:** `api://AzureADTokenExchange`

    Then grant **Storage Blob Data Reader** role to the application on your storage account.

    ```bash
        firectl model create <MODEL_ID> https://<STORAGE_ACCOUNT>.blob.core.windows.net/<CONTAINER>/<PATH> \
          --azure-client-id <CLIENT_ID> \
          --azure-tenant-id <TENANT_ID>
        ```
  <span class="tab-end"></span>
<span class="tab-group-end"></span>
  <span class="tab-end"></span>

  <span class="tab-start" data-tab-title="REST API"></span>
For programmatic uploads (automation, CI/CD pipelines, notebooks), use the Fireworks REST API.

See [Upload via REST API](/models/uploading-custom-models-api) for the full pipeline and a ready-to-use Python example.
  <span class="tab-end"></span>
<span class="tab-group-end"></span>

<span class="callout-start" data-callout-type="note"></span>
  If you're uploading an embedding model, add the `--embedding` flag.
<span class="callout-end"></span>

## Verifying your upload

After uploading, verify your model is ready to deploy:

```bash
firectl model get accounts/<ACCOUNT_ID>/models/<MODEL_NAME>

Look for State: READY in the output. Once ready, you can create a deployment.

Model states:

UPLOADING – Files are being uploaded or awaiting validation (for REST API uploads)
READY – Model is validated and ready to deploy
FAILED – Upload or validation failed (check error details)

If using the REST API and your model remains in UPLOADING state, you most likely haven’t called the validation endpoint yet — REST uploads don’t become deployable until they’re validated. See Why validation is a separate step for details.

Deploying your model#

Once your model shows State: READY, create a deployment:

firectl deployment create accounts/<ACCOUNT_ID>/models/<MODEL_NAME> --wait

See the On-demand deployments guide for configuration options like GPU types, autoscaling, and quantization.

Publishing your model#

By default, models are private to your account. Publish a model to make it available to other Fireworks users.

When published:

Listed in the public model catalog
Deployable by anyone with a Fireworks account
Still hosted and controlled by your account

Publish a model:

firectl model update <MODEL_ID> --public

Unpublish a model:

firectl model update <MODEL_ID> --public=false

Importing fine-tuned models#

In addition to models you fine-tune on the Fireworks platform, you can also upload your own custom fine-tuned models as LoRA adapters.

Uploaded LoRA adapters can only be deployed to on-demand (dedicated) deployments. Serverless deployment is not supported.

Requirements#

Your custom LoRA addon must contain the following files:

adapter_config.json - The Hugging Face adapter configuration file
adapter_model.bin or adapter_model.safetensors - The saved addon file

The adapter_config.json must contain the following fields:

r - The number of LoRA ranks. Must be an integer between 4 and 64, inclusive
target_modules - A list of target modules. Currently the following target modules are supported:
- q_proj
- k_proj
- v_proj
- o_proj
- up_proj or w1
- down_proj or w2
- gate_proj or w3
- block_sparse_moe.gate

Additional fields may be specified but are ignored.

Customizing generation defaults with fireworks.json#

For LoRA adapters, use a fireworks.json file to customize generation defaults. This is the recommended approach because adapters inherit configuration from their base model—modifying generation_config.json in the adapter folder won’t work.

Add a fireworks.json file to the directory containing your adapter files:

{
  "defaults": {
    "stop": ["<|im_end|>", "</s>"],
    "max_tokens": 1024,
    "temperature": 0.7,
    "top_k": 50,
    "top_p": 0.9,
    "min_p": 0.0,
    "typical_p": 1.0,
    "frequency_penalty": 0.0,
    "presence_penalty": 0.0,
    "repetition_penalty": 1.0
  },
  "model_arch": null,
  "model_config_name": null,
  "has_lora": true,
  "has_teft": false
}

These defaults are applied when the user doesn't specify values in their API request:

Field	Type	Example	Description
`stop`	array	`["<\|im_end\|>", "</s>"]`	Default stop sequences
`max_tokens`	integer	`1024`	Default maximum tokens to generate
`temperature`	float	`0.7`	Default sampling temperature
`top_k`	integer	`50`	Default top-k sampling
`top_p`	float	`0.9`	Default nucleus sampling probability
`min_p`	float	`0.0`	Default minimum probability threshold
`typical_p`	float	`1.0`	Default typical sampling probability
`frequency_penalty`	float	`0.0`	Default frequency penalty
`presence_penalty`	float	`0.0`	Default presence penalty
`repetition_penalty`	float	`1.0`	Default repetition penalty

| Field | Default | Description | | ------------------- | ------- | -------------------------------------------------------------------------------------- | | `model_arch` | null | Model architecture (e.g., `"qwen2"`, `"llama"`). Usually auto-detected from base model | | `model_config_name` | null | Model configuration name (e.g., `"4B"`). Usually auto-detected from base model | | `has_lora` | true | Set to `true` for LoRA adapters | | `has_teft` | false | Set to `true` if using TEFT (Token-Efficient Fine-Tuning) |

All fields in fireworks.json are optional. Include only the fields you need to override.

Uploading the LoRA adapter#

To upload a LoRA addon, run the following command. The MODEL_ID is an arbitrary resource ID to refer to the model within Fireworks.

Only some base models support LoRA addons.

firectl model create <MODEL_ID> /path/to/files/ --base-model "accounts/fireworks/models/<BASE_MODEL_ID>"

Next steps#

Configure GPU types, autoscaling, and optimization

Reduce serving costs with model quantization

Fine-tune models before deploying them

Link last verified June 7, 2026. View original ↗

Source: Fireworks AI Docs

Link last verified: 2026-06-07