Streaming Responses
Streaming allows you to receive responses in real-time as they’re generated, providing a better user experience for chat applications.
Enable Streaming
Set stream: true in your request:
{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}],
"stream": true
}Response Format
Streaming responses use Server-Sent Events (SSE) format. Each chunk looks like:
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}
data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Examples
from openai import OpenAI
client = OpenAI(
api_key="your-api-key",
base_url="https://api.fizzlyapi.com/v1"
)
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Write a short story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)React/Next.js Example
Using Vercel AI SDK:
import { useChat } from 'ai/react';
export default function Chat() {
const { messages, input, handleInputChange, handleSubmit } = useChat({
api: '/api/chat',
});
return (
<div>
{messages.map(m => (
<div key={m.id}>
{m.role}: {m.content}
</div>
))}
<form onSubmit={handleSubmit}>
<input value={input} onChange={handleInputChange} />
<button type="submit">Send</button>
</form>
</div>
);
}API route (/api/chat):
import OpenAI from 'openai';
import { OpenAIStream, StreamingTextResponse } from 'ai';
const openai = new OpenAI({
apiKey: process.env.FIZZLY_API_KEY,
baseURL: 'https://api.fizzlyapi.com/v1',
});
export async function POST(req: Request) {
const { messages } = await req.json();
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages,
stream: true,
});
const stream = OpenAIStream(response);
return new StreamingTextResponse(stream);
}Handling Errors in Streams
Errors during streaming are returned as SSE events:
data: {"error":{"message":"Rate limit exceeded","type":"rate_limit_error"}}Handle them in your code:
try:
for chunk in stream:
if hasattr(chunk, 'error'):
print(f"Error: {chunk.error.message}")
break
# Process chunk...
except Exception as e:
print(f"Stream error: {e}")Best Practices
Performance Tips:
- Use streaming for any response > 100 tokens
- Display a typing indicator while waiting for first chunk
- Buffer chunks for smoother display
- Handle connection interruptions gracefully
Connection Timeout
For long-running streams, ensure your client handles timeouts:
# Python
client = OpenAI(
api_key="your-api-key",
base_url="https://api.fizzlyapi.com/v1",
timeout=300.0 # 5 minutes
)// Node.js
const openai = new OpenAI({
apiKey: 'your-api-key',
baseURL: 'https://api.fizzlyapi.com/v1',
timeout: 300000, // 5 minutes
});