Overview
Streaming allows you to receive responses in real-time as they’re generated, instead of waiting for the complete response.
Enable Streaming
Set stream: true
in your request:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.redpill.ai/v1"
)
stream = client.chat.completions.create(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Write a long story"}],
stream=True
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
Responses are sent as Server-Sent Events (SSE):
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-123","choices":[{"delta":{"content":" world"}}]}
data: [DONE]
Error Handling
try:
stream = client.chat.completions.create(
model="openai/gpt-4o",
messages=[...],
stream=True
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")
except Exception as e:
print(f"Stream error: {e}")
Use Cases
- Chatbots: Display responses as they’re typed
- Code generation: Show code as it’s written
- Long-form content: Stream articles/essays
- Better UX: Reduce perceived latency
All 218+ models support streaming, including Phala confidential models.