Skip to main content
POST
/
chat
/
completions
Chat Completions
curl --request POST \
  --url https://api.redpill.ai/v1/chat/completions \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: application/json' \
  --data '{
  "model": "<string>",
  "messages": [
    {}
  ],
  "temperature": 123,
  "max_tokens": 123,
  "max_completion_tokens": 123,
  "stream": true,
  "top_p": 123,
  "n": 123,
  "presence_penalty": 123,
  "frequency_penalty": 123
}'

Create Chat Completion

Creates a model response for the given chat conversation. All requests are TEE-protected.
POST https://api.redpill.ai/v1/chat/completions
Try it now! Click the “Try it” button above to test the API in the playground. You’ll need:
  1. Your API key (add it when prompted)
  2. Fill in messages like: [{"role":"user","content":"Hello"}]

Request Body

model
string
default:"openai/gpt-5"
required
Model ID to use for completionExamples: openai/gpt-5, anthropic/claude-sonnet-4.5, phala/qwen-2.5-7b-instruct
messages
array
required
Array of message objects. Each message needs role and content.Example:
[
  {"role": "user", "content": "What is RedPill AI?"}
]
With system message:
[
  {"role": "system", "content": "You are a helpful assistant"},
  {"role": "user", "content": "Hello!"}
]
temperature
number
Sampling temperature (0-2), default 1
max_tokens
integer
Maximum tokens to generateNote: Newer models (GPT-5, O3, O4) use max_completion_tokens instead. See note below.
max_completion_tokens
integer
Maximum completion tokens (for GPT-5, O3, O4 models)Use this parameter instead of max_tokens for newer OpenAI models:
  • openai/gpt-5, openai/gpt-5-mini, openai/gpt-5-nano
  • openai/o3, openai/o4-mini
stream
boolean
Stream responses, default false
top_p
number
Nucleus sampling (0-1)
n
integer
Number of completions, default 1
presence_penalty
number
Presence penalty (-2 to 2)
frequency_penalty
number
Frequency penalty (-2 to 2)

Message Object

{
  "role": "user" | "assistant" | "system",
  "content": "string" | array
}

Example Requests

curl https://api.redpill.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "openai/gpt-5",
    "messages": [
      {"role": "user", "content": "What is RedPill AI?"}
    ]
  }"

Response

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "openai/gpt-5",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "RedPill AI is a privacy-first AI platform..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 13,
    "completion_tokens": 25,
    "total_tokens": 38
  }
}
Important: Parameter Difference for Newer ModelsGPT-5, O3, and O4 models require max_completion_tokens instead of max_tokens:
# ❌ Doesn't work for GPT-5/O3/O4
client.chat.completions.create(
    model="openai/gpt-5",
    max_tokens=100  # Error: unsupported parameter
)

# ✅ Works for GPT-5/O3/O4
client.chat.completions.create(
    model="openai/gpt-5",
    max_completion_tokens=100  # Correct parameter
)

# ℹ️ Older models (GPT-4.1, Claude, etc.) still use max_tokens
Affected models:
  • openai/gpt-5, openai/gpt-5-mini, openai/gpt-5-nano
  • openai/o3, openai/o4-mini
Other models (use max_tokens):
  • All GPT-4.1 models, Claude models, Gemini, DeepSeek, Phala models

Streaming

Enable stream: true for real-time responses:
stream = client.chat.completions.create(
    model="openai/gpt-5",
    messages=[{"role": "user", "content": "Write a story"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Vision (Multimodal)

Use vision models with images:
response = client.chat.completions.create(
    model="phala/qwen2.5-vl-72b-instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://..."}}
        ]
    }]
)

Function Calling

Define tools/functions for the model to call:
tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Get current weather",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    }
}]

response = client.chat.completions.create(
    model="openai/gpt-5",
    messages=[{"role": "user", "content": "What's the weather in SF?"}],
    tools=tools
)

Function Calling Guide

Learn more about function calling →

Error Handling

try:
    response = client.chat.completions.create(...)
except openai.AuthenticationError:
    print("Invalid API key")
except openai.RateLimitError:
    print("Rate limit exceeded")
except openai.BadRequestError as e:
    print(f"Bad request: {e}")

Supported Models

  • OpenAI: openai/gpt-5, openai/gpt-5-mini, openai/o4-mini
  • Anthropic: anthropic/claude-sonnet-4.5, anthropic/claude-opus-4.1
  • Google: google/gemini-1.5-pro
  • Meta: meta-llama/llama-3.3-70b-instruct
  • Phala TEE: phala/deepseek-chat-v3-0324
  • +200 more models

All Models

View all 50+ supported models →