Chat Completions - RedPill AI

Create Chat Completion

Creates a model response for the given chat conversation. All requests are TEE-protected.

POST https://api.redpill.ai/v1/chat/completions

Try it now! Click the “Try it” button above to test the API in the playground. You’ll need:

Your API key (add it when prompted)
Fill in messages like: [{"role":"user","content":"Hello"}]

Request Body

model

string

default:"z-ai/glm-5.1"

required

Model ID to use for completionExamples: z-ai/glm-5.1, z-ai/glm-5, phala/qwen3.5-27b, openai/gpt-5, anthropic/claude-sonnet-4.5

messages

array

required

Array of message objects. Each message needs role and content.Example:

[
  {"role": "user", "content": "What is RedPill AI?"}
]

With system message:

[
  {"role": "system", "content": "You are a helpful assistant"},
  {"role": "user", "content": "Hello!"}
]

temperature

number

Sampling temperature (0-2), default 1

max_tokens

integer

Maximum tokens to generateNote: Newer models (GPT-5, O3, O4) use max_completion_tokens instead. See note below.

max_completion_tokens

integer

Maximum completion tokens (for GPT-5, O3, O4 models)Use this parameter instead of max_tokens for newer OpenAI models:

openai/gpt-5, openai/gpt-5-mini, openai/gpt-5-nano
openai/o3, openai/o4-mini

stream

boolean

Stream responses, default false

top_p

number

Nucleus sampling (0-1)

integer

Number of completions, default 1

presence_penalty

number

Presence penalty (-2 to 2)

frequency_penalty

number

Frequency penalty (-2 to 2)

Message Object

{
  "role": "user" | "assistant" | "system",
  "content": "string" | array
}

Example Requests

curl https://api.redpill.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "z-ai/glm-5.1",
    "messages": [
      {"role": "user", "content": "What is RedPill AI?"}
    ]
  }"

Response

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "openai/gpt-5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "RedPill AI is a privacy-first AI platform..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 25,
    "total_tokens": 38
  }
}

Important: Parameter Difference for Newer ModelsGPT-5, O3, and O4 models require max_completion_tokens instead of max_tokens:

# ❌ Doesn't work for GPT-5/O3/O4
client.chat.completions.create(
    model="openai/gpt-5",
    max_tokens=100  # Error: unsupported parameter
)

# ✅ Works for GPT-5/O3/O4
client.chat.completions.create(
    model="openai/gpt-5",
    max_completion_tokens=100  # Correct parameter
)

# ℹ️ Older models (GPT-4.1, Claude, etc.) still use max_tokens

Affected models:

openai/gpt-5, openai/gpt-5-mini, openai/gpt-5-nano
openai/o3, openai/o4-mini

Other models (use max_tokens):

All GPT-4.1 models, Claude models, Gemini, DeepSeek, and GPU TEE models

Streaming

Enable stream: true for real-time responses:

stream = client.chat.completions.create(
    model="openai/gpt-5",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Vision (Multimodal)

Use vision models with images:

response = client.chat.completions.create(
    model="phala/qwen3-vl-30b-a3b-instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://..."}}
        ]
    }]
)

Function Calling

Define tools/functions for the model to call:

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="openai/gpt-5",
    messages=[{"role": "user", "content": "What's the weather in SF?"}],
    tools=tools
)

Function Calling Guide

Learn more about function calling →

Error Handling

try:
    response = client.chat.completions.create(...)
except openai.AuthenticationError:
    print("Invalid API key")
except openai.RateLimitError:
    print("Rate limit exceeded")
except openai.BadRequestError as e:
    print(f"Bad request: {e}")

Supported Models

GPU TEE: z-ai/glm-5.1, z-ai/glm-5, phala/qwen3.5-27b
OpenAI: openai/gpt-5, openai/gpt-5-mini, openai/o4-mini
Anthropic: anthropic/claude-sonnet-4.5, anthropic/claude-opus-4.1
Google: google/gemini-1.5-pro
Meta: meta-llama/llama-3.3-70b-instruct
50+ total models

All Models

View all supported models →

Documentation Index

​Create Chat Completion

​Request Body

​Message Object

​Example Requests

​Response

​Streaming

​Vision (Multimodal)

​Function Calling

Function Calling Guide

​Error Handling

​Supported Models

All Models

Create Chat Completion

Request Body

Message Object

Example Requests

Response

Streaming

Vision (Multimodal)

Function Calling

Error Handling

Supported Models