Overview

Streaming allows you to receive responses in real-time as they’re generated, instead of waiting for the complete response.

Enable Streaming

Set stream: true in your request:
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.redpill.ai/v1"
)

stream = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Write a long story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Stream Format

Responses are sent as Server-Sent Events (SSE):
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{"content":" world"}}]}

data: [DONE]

Error Handling

try:
    stream = client.chat.completions.create(
        model="openai/gpt-4o",
        messages=[...],
        stream=True
    )
    for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="")
except Exception as e:
    print(f"Stream error: {e}")

Use Cases

  • Chatbots: Display responses as they’re typed
  • Code generation: Show code as it’s written
  • Long-form content: Stream articles/essays
  • Better UX: Reduce perceived latency
All 218+ models support streaming, including Phala confidential models.