Streaming Responses

Overview

Streaming allows you to receive responses in real-time as they’re generated, instead of waiting for the complete response.

Enable Streaming

Set stream: true in your request:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.redpill.ai/v1"
)

stream = client.chat.completions.create(
    model="openai/gpt-5",
    messages=[{"role": "user", "content": "Write a long story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)

Stream Format

Responses are sent as Server-Sent Events (SSE):

data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-123","choices":[{"delta":{"content":" world"}}]}

data: [DONE]

Error Handling

try:
    stream = client.chat.completions.create(
        model="openai/gpt-5",
        messages=[...],
        stream=True
    )
    for chunk in stream:
        print(chunk.choices[0].delta.content or "", end="")
except Exception as e:
    print(f"Stream error: {e}")

Use Cases

Chatbots: Display responses as they’re typed
Code generation: Show code as it’s written
Long-form content: Stream articles/essays
Better UX: Reduce perceived latency

All 50+ models support streaming, including Phala confidential models.

Get Started

Core Concepts

Integrations

Use Cases

Guides

Overview

Enable Streaming

Stream Format

Error Handling

Use Cases

Get Started

Core Concepts

Integrations

Use Cases

Guides

​Overview

​Enable Streaming

​Stream Format

​Error Handling

​Use Cases

Overview

Enable Streaming

Stream Format

Error Handling

Use Cases