Streaming Responses

Streaming allows you to receive responses in real-time as they’re generated, providing a better user experience for chat applications.

Enable Streaming

Set stream: true in your request:

{
  "model": "gpt-4o",
  "messages": [{"role": "user", "content": "Hello!"}],
  "stream": true
}

Response Format

Streaming responses use Server-Sent Events (SSE) format. Each chunk looks like:

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{"content":"!"},"finish_reason":null}]}

data: {"id":"chatcmpl-xxx","object":"chat.completion.chunk","created":1234567890,"model":"gpt-4o","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Examples

from openai import OpenAI
 
client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.fizzlyapi.com/v1"
)
 
stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a short story"}],
    stream=True
)
 
for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

React/Next.js Example

Using Vercel AI SDK:

import { useChat } from 'ai/react';
 
export default function Chat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    api: '/api/chat',
  });
 
  return (
    <div>
      {messages.map(m => (
        <div key={m.id}>
          {m.role}: {m.content}
        </div>
      ))}
      <form onSubmit={handleSubmit}>
        <input value={input} onChange={handleInputChange} />
        <button type="submit">Send</button>
      </form>
    </div>
  );
}

API route (/api/chat):

import OpenAI from 'openai';
import { OpenAIStream, StreamingTextResponse } from 'ai';
 
const openai = new OpenAI({
  apiKey: process.env.FIZZLY_API_KEY,
  baseURL: 'https://api.fizzlyapi.com/v1',
});
 
export async function POST(req: Request) {
  const { messages } = await req.json();
 
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages,
    stream: true,
  });
 
  const stream = OpenAIStream(response);
  return new StreamingTextResponse(stream);
}

Handling Errors in Streams

Errors during streaming are returned as SSE events:

data: {"error":{"message":"Rate limit exceeded","type":"rate_limit_error"}}

Handle them in your code:

try:
    for chunk in stream:
        if hasattr(chunk, 'error'):
            print(f"Error: {chunk.error.message}")
            break
        # Process chunk...
except Exception as e:
    print(f"Stream error: {e}")

Best Practices

Performance Tips:

  1. Use streaming for any response > 100 tokens
  2. Display a typing indicator while waiting for first chunk
  3. Buffer chunks for smoother display
  4. Handle connection interruptions gracefully

Connection Timeout

For long-running streams, ensure your client handles timeouts:

# Python
client = OpenAI(
    api_key="your-api-key",
    base_url="https://api.fizzlyapi.com/v1",
    timeout=300.0  # 5 minutes
)
// Node.js
const openai = new OpenAI({
  apiKey: 'your-api-key',
  baseURL: 'https://api.fizzlyapi.com/v1',
  timeout: 300000, // 5 minutes
});