Enable streaming responses ↗
noSummary: How to use streaming output with W&B Inference
Original Documentation
Documentation Index#
Fetch the complete documentation index at: https://docs.wandb.ai/llms.txt Use this file to discover all available pages before exploring further.
How to use streaming output with W&B Inference
Sometimes models take a while to generate a response.
Setting the stream option to true allows you to receive the response as a stream
of chunks, allowing you to incrementally display results instead of waiting for the entire
response to be generated.
Streaming output is supported for all hosted models. We especially encourage its use with reasoning models, as non-streaming requests may timeout if the model thinks for too long before output starts.
import openai
client = openai.OpenAI(
base_url='https://api.inference.wandb.ai/v1',
api_key="<your-api-key>", # Create an API key at https://wandb.ai/settings
)
stream = client.chat.completions.create(
model="openai/gpt-oss-120b",
messages=[
{"role": "user", "content": "Tell me a rambling joke"}
],
stream=True,
)
for chunk in stream:
if chunk.choices:
print(chunk.choices[0].delta.content or "", end="", flush=True)
else:
print(chunk) # Show CompletionUsage object
```
<span class="tab-end"></span>
<span class="tab-start" data-tab-title="Bash"></span>
```bash
curl https://api.inference.wandb.ai/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <your-api-key>" \
-d '{
"model": "openai/gpt-oss-120b",
"messages": [
{ "role": "user", "content": "Tell me a rambling joke" }
],
"stream": true
}'
```
<span class="tab-end"></span>
<span class="tab-group-end"></span>Link last verified
June 7, 2026.
View original ↗
Source: Weights & Biases Docs
Link last verified: 2026-03-04