Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.redpill.ai/llms.txt

Use this file to discover all available pages before exploring further.

Overview

RedPill supports 50+ active AI models accessible through a single TEE-protected API. GPU TEE models add end-to-end confidential inference through Phala Network, Near AI, Tinfoil, and Chutes.
Currently Active: OpenAI, Anthropic, Google, xAI, DeepSeek, Qwen, ZhipuAI, Meta, MoonshotAI, NousResearch, and more

List Models via API

Get the latest models programmatically ->

Model Categories

Chat Models

Conversational AI for chatbots and assistants

Instruction Models

Task completion and code generation

Vision Models

Image understanding and analysis

Embedding Models

Text embeddings for search and similarity

GPU TEE Confidential Models

RedPill offers 26 priced GPU TEE model entries running entirely in confidential GPU infrastructure across 4 providers. Some aliases are also accepted for compatibility; call /v1/models for the live list.

Chutes

Model IDContextPromptCompletion
z-ai/glm-5.1203K$1.21/M$4.20/M
moonshotai/kimi-k2.6262K$1.09/M$4.60/M
qwen/qwen3.5-397b-a17b262K$0.55/M$3.50/M
qwen/qwen3-coder-next262K$0.18/M$1.20/M
minimax/minimax-m2.5197K$0.20/M$1.38/M
xiaomi/mimo-v2-flash262K$0.10/M$0.30/M
deepseek/deepseek-v3.2164K$0.32/M$0.48/M
moonshotai/kimi-k2.5262K$0.6/M$3/M

Near AI

Model IDContextPromptCompletion
z-ai/glm-5203K$1.20/M$3.50/M
deepseek/deepseek-chat-v3.1164K$1.05/M$3.10/M
openai/gpt-oss-120b131K$0.10/M$0.49/M
qwen/qwen3-30b-a3b-instruct-2507262K$0.15/M$0.55/M
z-ai/glm-4.7131K$0.85/M$3.3/M

Phala Network

Model IDContextPromptCompletion
phala/qwen3.5-27b262K$0.30/M$2.40/M
phala/qwen3-vl-30b-a3b-instruct128K$0.2/M$0.7/M
qwen/qwen3-embedding-8b32K$0.01/M$0/M
phala/gemma-3-27b-it53K$0.11/M$0.4/M
phala/glm-4.7-flash202K$0.1/M$0.43/M
phala/gpt-oss-20b131K$0.04/M$0.15/M
phala/qwen-2.5-7b-instruct32K$0.04/M$0.1/M
phala/uncensored-24b32K$0.2/M$0.9/M
sentence-transformers/all-minilm-l6-v2512$0.005/M$0/M

Tinfoil

Model IDContextPromptCompletion
qwen/qwen3-coder-480b-a35b-instruct262K$2/M$2/M
moonshotai/kimi-k2-thinking262K$2/M$2/M
deepseek/deepseek-r1-0528163K$2/M$2/M
meta-llama/llama-3.3-70b-instruct131K$2/M$2/M

Learn About Confidential AI

Explore GPU TEE models in detail ->

OpenAI Models

Model IDContextPromptCompletion
openai/gpt-5.2400K$1.75/M$14/M
openai/gpt-5.1400K$1.25/M$10/M
openai/gpt-5400K$1.25/M$10/M
openai/gpt-5-mini400K$0.25/M$2/M
openai/gpt-5-nano400K$0.05/M$0.4/M
openai/o3200K$2/M$8/M
openai/o4-mini200K$1.1/M$4.4/M
openai/gpt-4.11M$2/M$8/M
openai/gpt-4.1-mini1M$0.4/M$1.6/M
openai/gpt-4.1-nano1M$0.1/M$0.4/M
openai/gpt-4o128K$2.5/M$10/M
openai/gpt-4o-mini128K$0.15/M$0.6/M
openai/gpt-48K$30/M$60/M
openai/gpt-3.5-turbo16K$0.5/M$1.5/M
New: GPT-5.2 is OpenAI’s latest flagship model with enhanced reasoning and improved performance across all benchmarks.

Anthropic Models

Model IDContextPromptCompletion
anthropic/claude-opus-4.61M$10/M$37.5/M
anthropic/claude-sonnet-4.61M$3/M$15/M
anthropic/claude-opus-4.5200K$5/M$25/M
anthropic/claude-sonnet-4.51M$3/M$15/M
anthropic/claude-opus-4.1200K$15/M$75/M
anthropic/claude-opus-4200K$15/M$75/M
anthropic/claude-sonnet-41M$3/M$15/M
anthropic/claude-haiku-4.5200K$1/M$5/M
anthropic/claude-3.7-sonnet200K$3/M$15/M
anthropic/claude-3.5-haiku200K$0.8/M$4/M
New: Claude Opus 4.6 and Sonnet 4.6 are the latest Anthropic models. Claude Opus 4.5 and Haiku 4.5 are also now available.

Google Models

Model IDContextPromptCompletion
google/gemini-3-pro-preview1M$4/M$18/M
google/gemini-2.5-pro1M$2.5/M$15/M
google/gemini-2.5-flash1M$0.3/M$2.5/M
google/gemini-2.5-flash-lite1M$0.1/M$0.4/M
New: Gemini 3 Pro Preview is Google’s next-generation model with advanced reasoning capabilities. Gemini 2.5 Pro pricing updated to $2.5/M prompt.

xAI Models

Model IDContextPromptCompletion
x-ai/grok-4256K$3/M$15/M
x-ai/grok-4.1-fast2M$0.2/M$0.5/M
x-ai/grok-code-fast-1256K$0.2/M$1.5/M
New: Grok 4.1 Fast now supports a massive 2M token context window at just $0.2/M prompt.

DeepSeek Models

Model IDContextPromptCompletionProvider
deepseek/deepseek-v3.2164K$0.32/M$0.48/MChutes
deepseek/deepseek-chat-v3.1164K$1.05/M$3.10/MNear AI
deepseek/deepseek-r1-0528163K$2/M$2/MTinfoil
New: DeepSeek V3.2 is the latest version available through Chutes. DeepSeek R1 is a reasoning model available with Tinfoil TEE protection.

MoonshotAI Models

Model IDContextPromptCompletionProvider
moonshotai/kimi-k2.6262K$1.09/M$4.60/MChutes
moonshotai/kimi-k2.5262K$0.6/M$3/MChutes
moonshotai/kimi-k2-thinking262K$2/M$2/MTinfoil
New: Kimi K2.5 is a native multimodal model with state-of-the-art visual coding. Kimi K2 Thinking is an advanced reasoning model optimized for long-horizon agentic tasks.

Qwen Models

Model IDContextPromptCompletionProvider
qwen/qwen3-coder-480b-a35b-instruct262K$2/M$2/MTinfoil
qwen/qwen3-coder-next262K$0.18/M$1.20/MChutes
qwen/qwen3.5-397b-a17b262K$0.55/M$3.50/MChutes
qwen/qwen3.5-27b262K$0.30/M$2.40/MPhala
qwen/qwen3-vl-30b-a3b-instruct128K$0.2/M$0.7/MPhala
qwen/qwen3-30b-a3b-instruct-2507262K$0.15/M$0.55/MNear AI
qwen/qwen-2.5-7b-instruct32K$0.04/M$0.1/MPhala

ZhipuAI Models

Model IDContextPromptCompletionProvider
z-ai/glm-5.1203K$1.21/M$4.20/MChutes
z-ai/glm-5203K$1.2/M$3.5/MNear AI
z-ai/glm-4.7131K$0.85/M$3.3/MNear AI
z-ai/glm-4.7-flash202K$0.1/M$0.43/MPhala
New: GLM-5 is ZhipuAI’s latest flagship model for systems engineering. GLM 4.7 Flash offers excellent speed-to-quality ratio with 202K context.

Other Models

Model IDContextPromptCompletionProvider
openai/gpt-oss-120b131K$0.1/M$0.49/MNear AI
openai/gpt-oss-20b131K$0.04/M$0.15/MPhala
google/gemma-3-27b-it53K$0.11/M$0.4/MPhala
minimax/minimax-m2.5197K$0.20/M$1.38/MChutes
xiaomi/mimo-v2-flash262K$0.10/M$0.30/MChutes
meta-llama/llama-3.3-70b-instruct131K$2/M$2/MTinfoil
nousresearch/hermes-3-llama-3.1-405b131K$1/M$1/MOpenRouter
phala/uncensored-24b32K$0.2/M$0.9/MPhala

Vision Models

Models that understand images:
Model IDContextFeaturesProvider
moonshotai/kimi-k2.6262KVision + TextChutes
moonshotai/kimi-k2.5262KVision + TextChutes
phala/qwen3-vl-30b-a3b-instruct128KVision + TextPhala
phala/gemma-3-27b-it53KVision + TextPhala

Embedding Models

Generate vector embeddings for semantic search:
Model IDDimensionsMax Tokens
qwen/qwen3-embedding-8bvaries32768
sentence-transformers/all-minilm-l6-v2384512
openai/text-embedding-3-large30728191
openai/text-embedding-3-small15368191
openai/text-embedding-ada-00215368191

Provider Coverage

Supported Providers

  • ZhipuAI - GLM-5.1, GLM-5, GLM-4.7, GLM-4.7 Flash
  • MoonshotAI - Kimi K2.6, Kimi K2.5, Kimi K2 Thinking
  • Qwen - Qwen3.5, Qwen3 Coder, Qwen3 VL, Qwen3 Embedding, Qwen 2.5
  • DeepSeek - DeepSeek V3.2, DeepSeek V3.1, DeepSeek R1
  • MiniMax - MiniMax M2.5
  • Xiaomi - MiMo V2 Flash
  • Meta - Llama 3.3 70B
  • OpenAI - GPT-5.2, GPT-5.1, GPT-5, GPT-4.1, O3, O4-mini, GPT-OSS, embeddings
  • Anthropic - Claude Opus 4.6, Claude Sonnet 4.6, Claude Opus 4.5, Claude Sonnet 4.5, Claude Opus 4.1, Claude Haiku 4.5, Claude 3.7 Sonnet, Claude 3.5 Haiku
  • Google - Gemini 3 Pro Preview, Gemini 2.5 Pro/Flash/Flash-Lite
  • xAI - Grok 4, Grok 4.1 Fast, Grok Code Fast
  • NousResearch - Hermes 3 Llama 405B

Pricing Tiers

Budget Models

Balance of cost and quality:
  • phala/qwen-2.5-7b-instruct - $0.04/M prompt (TEE)
  • phala/gpt-oss-20b - $0.04/M prompt (TEE)
  • phala/glm-4.7-flash - $0.1/M prompt (TEE)
  • xiaomi/mimo-v2-flash - $0.10/M prompt (TEE)
  • openai/gpt-5-nano - $0.05/M prompt
  • google/gemini-2.5-flash-lite - $0.1/M prompt
  • openai/gpt-4o-mini - $0.15/M prompt

Mid-Range Models

Production-ready with strong quality:
  • openai/gpt-5-mini - $0.25/M prompt
  • google/gemini-2.5-flash - $0.3/M prompt
  • deepseek/deepseek-v3.2 - $0.32/M prompt (TEE)
  • moonshotai/kimi-k2.5 - $0.6/M prompt
  • anthropic/claude-3.5-haiku - $0.8/M prompt

Premium Models

Best quality for production:
  • openai/gpt-5.2 - $1.75/M prompt
  • anthropic/claude-sonnet-4.5 - $3/M prompt
  • google/gemini-2.5-pro - $2.5/M prompt
  • anthropic/claude-opus-4.6 - $10/M prompt

Model Selection Guide

By Use Case

TEE Best: z-ai/glm-5.1, z-ai/glm-5Non-TEE Best: anthropic/claude-sonnet-4.5, openai/gpt-5Budget: phala/qwen3.5-27b, openai/gpt-5-mini
TEE Best: qwen/qwen3-coder-next, qwen/qwen3-coder-480b-a35b-instruct, moonshotai/kimi-k2.6Non-TEE Best: openai/gpt-5, anthropic/claude-opus-4.6Budget: phala/glm-4.7-flash, x-ai/grok-code-fast-1
TEE Best: z-ai/glm-5.1, z-ai/glm-5, deepseek/deepseek-chat-v3.1Non-TEE Best: anthropic/claude-sonnet-4.5, google/gemini-2.5-proBudget: phala/qwen3.5-27b, google/gemini-2.5-flash
Best: phala/qwen3-vl-30b-a3b-instruct, moonshotai/kimi-k2.5Budget: phala/gemma-3-27b-it
TEE-Protected: Chutes, Near AI, Phala, and Tinfoil models
  • z-ai/glm-5.1 - Best quality
  • z-ai/glm-5 - Systems engineering
  • phala/qwen3.5-27b - Budget option
TEE: moonshotai/kimi-k2.6 (262K tokens), z-ai/glm-5 (203K tokens), phala/glm-4.7-flash (202K tokens)Non-TEE: x-ai/grok-4.1-fast (2M tokens), google/gemini-2.5-pro (1M tokens), anthropic/claude-opus-4.6 (1M tokens)
TEE Best: z-ai/glm-5.1, moonshotai/kimi-k2-thinking, deepseek/deepseek-r1-0528Non-TEE Best: openai/o3, anthropic/claude-sonnet-4.5Budget: phala/glm-4.7-flash, openai/o4-mini

Get Latest Models

Via API

# All models
curl https://api.redpill.ai/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

# Phala confidential models only
curl https://api.redpill.ai/v1/models/phala \
  -H "Authorization: Bearer YOUR_API_KEY"

Via SDK

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.redpill.ai/v1"
)

# List all models
models = client.models.list()
for model in models.data:
    print(f"{model.id}")

# Filter for TEE models
import requests
resp = requests.get(
    "https://api.redpill.ai/v1/models",
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)
tee_providers = {"chutes", "near-ai", "phala", "tinfoil"}
for model in resp.json()["data"]:
    if tee_providers.intersection(model.get("providers", [])):
        print(f"TEE: {model['id']}")

Model Properties

Each model includes:
PropertyDescription
idModel identifier for API calls
nameHuman-readable name
context_lengthMaximum tokens in context window
pricing.promptCost per token (prompt)
pricing.completionCost per token (completion)
input_modalitiesInput types (text, image, file, audio)
output_modalitiesOutput types (text)
providersArray of infrastructure providers (phala, tinfoil, near-ai, etc.)
metadata.appidPhala TEE application ID (for attestation)

Model Compatibility

OpenAI SDK Compatible

All models work with OpenAI SDK:
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.redpill.ai/v1"
)

# Use any model
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.5",
    messages=[{"role": "user", "content": "Hello"}]
)

Streaming Support

All chat models support streaming:
stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.5",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

Function Calling

Supported models:
  • GPU TEE confidential models
  • All OpenAI GPT models
  • Anthropic Claude 3+ models
  • Google Gemini models
  • Meta Llama 3.2+ models

FAQs

Yes! All requests flow through the TEE-protected gateway, regardless of model. For end-to-end TEE protection (including model inference), use Phala, Tinfoil, Near AI, or Chutes confidential models.
Use the /v1/models/phala endpoint or check the providers field for "phala". To find all GPU TEE models, filter providers for phala, near-ai, tinfoil, or chutes.
Regular models: TEE-protected gateway only (your request is protected in transit)TEE models (Phala/Tinfoil/Near AI/Chutes): Full end-to-end TEE (gateway + inference in GPU TEE)
We add new models regularly. Check the API or docs for the latest additions.
Yes! Email support@redpill.ai with your model request.

Next Steps

Start Using Models

Make your first request

Confidential AI Models

Explore all TEE models in detail

API Reference

Models API endpoint

Pricing Details

Understand pricing