Custom Models ↗
noOriginal Documentation
Documentation Index#
Fetch the complete documentation index at: https://docs.fireworks.ai/llms.txt Use this file to discover all available pages before exploring further.
Upload, verify, and deploy your own models from Hugging Face or elsewhere
Upload your own models from Hugging Face or elsewhere to deploy fine-tuned or custom-trained models optimized for your use case.
- Multiple upload options – Upload from local files or directly from S3 buckets or Azure Blob Storage
- Secure uploads – All uploads are encrypted and models remain private to your account by default
Requirements#
Supported architectures#
Fireworks supports most popular model architectures, including:
- DeepSeek V1, V2 & V3
- Qwen, Qwen2, Qwen2.5, Qwen2.5-VL, Qwen3
- Kimi K2 family
- GLM 4.X family
- Llama 1, 2, 3, 3.1, 4
- Mistral & Mixtral
- Gemma
- GPT-OSS 120B and 20B
Required files#
You’ll need standard Hugging Face model files: config.json, model weights (.safetensors or .bin), and tokenizer files.
- Model configuration:
config.json
Fireworks does not support the quantization_config option in config.json.
Model weights in one of the following formats:
*.safetensors*.bin- Weights index:
*.index.json - Tokenizer file(s), e.g.:
- Weights index:
tokenizer.modeltokenizer.jsontokenizer_config.jsonIf the requisite files are not present, model deployment may fail.
Customizing base model configuration#
For base models (not LoRA adapters), you can customize the chat template and generation defaults by modifying the standard Hugging Face configuration files:
- Chat template: Add or modify the
chat_templatefield intokenizer_config.json. See the Hugging Face guide on Templates for Chat Models for details. - Generation defaults: Modify
generation_config.jsonto set default generation parameters likemax_new_tokens,temperature,top_p, etc.
You can also use a fireworks.json file with base models. If present, fireworks.json takes priority over generation_config.json. See Customizing generation defaults with fireworks.json for the full fireworks.json schema.
For LoRA adapters, you must use fireworks.json to customize generation defaults. Modifying generation_config.json in the adapter folder won’t work because adapters inherit these settings from their base model.
Uploading your model#
For larger models, you can upload directly from cloud storage (S3 or Azure Blob Storage) for faster transfer instead of uploading from your local machine.
Upload from your local machine:
firectl model create <MODEL_ID> /path/to/files/
```
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="S3 bucket (CLI)"></span>
Upload directly from an Amazon S3 bucket:
```bash
firectl model create <MODEL_ID> s3://<BUCKET_NAME>/<PATH_TO_MODEL>/ \
--aws-access-key-id <ACCESS_KEY_ID> \
--aws-secret-access-key <SECRET_ACCESS_KEY>
```
See the [AWS documentation](https://docs.aws.amazon.com/IAM/latest/UserGuide/id-credentials-access-keys-update.html) for how to generate an access key ID and secret access key pair.
<span class="callout-start" data-callout-type="note"></span>
Ensure the IAM user has read access to the S3 bucket containing the model.
<span class="callout-end"></span>
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="Azure Blob Storage (CLI)"></span>
Upload directly from Azure Blob Storage using either SAS token or federated identity authentication.
<span class="tab-group-start"></span>
<span class="tab-start" data-tab-title="SAS token"></span>
First, create a Fireworks secret containing your Azure SAS token:
```bash
firectl secret create --name <SECRET_NAME> --value <SAS_TOKEN>
```
Then, upload the model using the secret:
```bash
firectl model create <MODEL_ID> https://<STORAGE_ACCOUNT>.blob.core.windows.net/<CONTAINER>/<PATH> \
--azure-sas-token-secret accounts/<ACCOUNT_ID>/secrets/<SECRET_NAME>
```
See the [Azure documentation](https://learn.microsoft.com/en-us/azure/storage/common/storage-sas-overview) for how to generate a SAS token.
<span class="callout-start" data-callout-type="note"></span>
Ensure the SAS token has read access to the Azure Blob Storage container containing the model.
<span class="callout-end"></span>
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="Federated identity"></span>
Use Azure AD federated identity for passwordless authentication.
**Setup:** Register an Azure AD application and add a federated credential with:
* **Issuer:** `https://accounts.google.com`
* **Subject:** `114308823136673488563`
* **Audience:** `api://AzureADTokenExchange`
Then grant **Storage Blob Data Reader** role to the application on your storage account.
```bash
firectl model create <MODEL_ID> https://<STORAGE_ACCOUNT>.blob.core.windows.net/<CONTAINER>/<PATH> \
--azure-client-id <CLIENT_ID> \
--azure-tenant-id <TENANT_ID>
```
<span class="tab-end"></span>
<span class="tab-group-end"></span>
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="REST API"></span>
For programmatic uploads (automation, CI/CD pipelines, notebooks), use the Fireworks REST API.
See [Upload via REST API](/models/uploading-custom-models-api) for the full pipeline and a ready-to-use Python example.
<span class="tab-end"></span>
<span class="tab-group-end"></span>
<span class="callout-start" data-callout-type="note"></span>
If you're uploading an embedding model, add the `--embedding` flag.
<span class="callout-end"></span>
## Verifying your upload
After uploading, verify your model is ready to deploy:
```bash
firectl model get accounts/<ACCOUNT_ID>/models/<MODEL_NAME>Look for State: READY in the output. Once ready, you can create a deployment.
Model states:
UPLOADING– Files are being uploaded or awaiting validation (for REST API uploads)READY– Model is validated and ready to deployFAILED– Upload or validation failed (check error details)
If using the REST API and your model remains in UPLOADING state, you most likely haven’t called the validation endpoint yet — REST uploads don’t become deployable until they’re validated. See Why validation is a separate step for details.
Deploying your model#
Once your model shows State: READY, create a deployment:
firectl deployment create accounts/<ACCOUNT_ID>/models/<MODEL_NAME> --waitSee the On-demand deployments guide for configuration options like GPU types, autoscaling, and quantization.
Publishing your model#
By default, models are private to your account. Publish a model to make it available to other Fireworks users.
When published:
- Listed in the public model catalog
- Deployable by anyone with a Fireworks account
- Still hosted and controlled by your account
Publish a model:
firectl model update <MODEL_ID> --publicUnpublish a model:
firectl model update <MODEL_ID> --public=falseImporting fine-tuned models#
In addition to models you fine-tune on the Fireworks platform, you can also upload your own custom fine-tuned models as LoRA adapters.
Uploaded LoRA adapters can only be deployed to on-demand (dedicated) deployments. Serverless deployment is not supported.
Requirements#
Your custom LoRA addon must contain the following files:
adapter_config.json- The Hugging Face adapter configuration fileadapter_model.binoradapter_model.safetensors- The saved addon file
The adapter_config.json must contain the following fields:
r- The number of LoRA ranks. Must be an integer between 4 and 64, inclusivetarget_modules- A list of target modules. Currently the following target modules are supported:q_projk_projv_projo_projup_projorw1down_projorw2gate_projorw3block_sparse_moe.gate
Additional fields may be specified but are ignored.
Customizing generation defaults with fireworks.json#
For LoRA adapters, use a fireworks.json file to customize generation defaults. This is the recommended approach because adapters inherit configuration from their base model—modifying generation_config.json in the adapter folder won’t work.
Add a fireworks.json file to the directory containing your adapter files:
{
"defaults": {
"stop": ["<|im_end|>", "</s>"],
"max_tokens": 1024,
"temperature": 0.7,
"top_k": 50,
"top_p": 0.9,
"min_p": 0.0,
"typical_p": 1.0,
"frequency_penalty": 0.0,
"presence_penalty": 0.0,
"repetition_penalty": 1.0
},
"model_arch": null,
"model_config_name": null,
"has_lora": true,
"has_teft": false
}| Field | Type | Example | Description |
|---|---|---|---|
stop | array | ["<|im_end|>", "</s>"] | Default stop sequences |
max_tokens | integer | 1024 | Default maximum tokens to generate |
temperature | float | 0.7 | Default sampling temperature |
top_k | integer | 50 | Default top-k sampling |
top_p | float | 0.9 | Default nucleus sampling probability |
min_p | float | 0.0 | Default minimum probability threshold |
typical_p | float | 1.0 | Default typical sampling probability |
frequency_penalty | float | 0.0 | Default frequency penalty |
presence_penalty | float | 0.0 | Default presence penalty |
repetition_penalty | float | 1.0 | Default repetition penalty |
All fields in fireworks.json are optional. Include only the fields you need to override.
Uploading the LoRA adapter#
To upload a LoRA addon, run the following command. The MODEL_ID is an arbitrary resource ID to refer to the model within Fireworks.
Only some base models support LoRA addons.
firectl model create <MODEL_ID> /path/to/files/ --base-model "accounts/fireworks/models/<BASE_MODEL_ID>"Next steps#
Configure GPU types, autoscaling, and optimization
Reduce serving costs with model quantization
Fine-tune models before deploying them