Skip to main content

Overview

RedPill supports 60+ active AI models accessible through a single TEE-protected API, with 66+ provider integrations in the codebase enabling easy expansion. Every request is hardware-protected regardless of which model you choose.
Currently Active: OpenAI, Anthropic, Google, xAI, DeepSeek, Qwen, ZhipuAI, Meta, NousResearch, and more Available Integrations: 66+ providers including Mistral, Groq, Together AI, Fireworks, Replicate, Cohere, Cerebras, Lambda, and others

List Models via API

Get the latest models programmatically →

Model Categories

Chat Models

Conversational AI for chatbots and assistants

Instruction Models

Task completion and code generation

Vision Models

Image understanding and analysis

Embedding Models

Text embeddings for search and similarity

OpenAI Models

Model IDContextPromptCompletion
openai/gpt-5.2400K$1.25/M$10/M
openai/gpt-5.1400K$1.25/M$10/M
openai/gpt-5400K$1.25/M$10/M
openai/gpt-5-mini400K$0.25/M$2/M
openai/gpt-5-nano400K$0.05/M$0.4/M
openai/o4-mini200K$1.1/M$4.4/M
openai/o3200K$2/M$8/M
openai/gpt-4.11M$2/M$8/M
openai/gpt-4.1-mini1M$0.4/M$1.6/M
New: GPT-5.2 is OpenAI’s latest flagship model with enhanced reasoning and improved performance across all benchmarks.

Anthropic Models

Model IDContextPromptCompletion
anthropic/claude-sonnet-4.51M$3/M$15/M
anthropic/claude-opus-4.1200K$15/M$75/M
anthropic/claude-opus-4200K$15/M$75/M
anthropic/claude-sonnet-41M$3/M$15/M
anthropic/claude-3.7-sonnet200K$3/M$15/M
anthropic/claude-3.5-haiku200K$0.8/M$4/M

Google Models

Model IDContextPromptCompletion
google/gemini-3-pro-preview1M$1.25/M$10/M
google/gemini-2.5-pro1M$1.25/M$10/M
google/gemini-2.5-flash1M$0.3/M$2.5/M
google/gemini-2.5-flash-lite1M$0.1/M$0.4/M
google/gemma-3-27b-it53K$0.11/M$0.4/M
New: Gemini 3 Pro Preview is Google’s next-generation model with advanced reasoning capabilities.

xAI Models

Model IDContextPromptCompletion
x-ai/grok-4128K$3/M$15/M
x-ai/grok-4.1-fast128K$1/M$5/M
x-ai/grok-code-fast-1128K$1/M$5/M
New: xAI’s Grok models are now available! Grok 4 is the flagship model, while Grok 4.1 Fast and Grok Code Fast offer optimized performance for speed and code generation.

DeepSeek Models

Model IDContextPromptCompletion
deepseek/deepseek-v3.2128K$0.28/M$1.14/M
deepseek/deepseek-chat-v3.1128K$0.28/M$1.14/M
deepseek/deepseek-chat-v3-0324128K$0.28/M$1.14/M
deepseek/deepseek-r1-0528128K$0.55/M$2.19/M
deepseek/deepseek-chat128K$0.14/M$0.28/M
New: DeepSeek V3.2 is the latest version with improved performance. DeepSeek R1 is a reasoning model optimized for complex multi-step tasks.

Qwen Models

Model IDContextPromptCompletion
qwen/qwen3-coder-480b-a35b-instruct131K$0.3/M$1.49/M
qwen/qwen3-vl-30b-a3b-instruct131K$0.15/M$0.6/M
qwen/qwen3-30b-a3b-instruct-2507131K$0.15/M$0.6/M
qwen/qwen2.5-vl-72b-instruct64K$0.59/M$0.59/M
qwen/qwen-2.5-7b-instruct32K$0.04/M$0.1/M
New: Qwen3 Coder 480B is the largest coding model available, with massive 480B parameters. Qwen3 VL and Qwen3 30B offer excellent multimodal and general capabilities.

ZhipuAI Models

Model IDContextPromptCompletion
z-ai/glm-4.6128K$0.5/M$2/M
New: GLM-4.6 is ZhipuAI’s latest large language model with strong Chinese and English bilingual capabilities.

GPU TEE Confidential Models

RedPill offers 15 confidential AI models running entirely in GPU TEE across 3 providers:

Phala Network (8 models)

Model IDContextPromptCompletion
deepseek/deepseek-v3.2128K$0.28/M$1.14/M
deepseek/deepseek-chat-v3-0324163K$0.28/M$1.14/M
openai/gpt-oss-120b131K$0.1/M$0.49/M
openai/gpt-oss-20b131K$0.04/M$0.15/M
qwen/qwen2.5-vl-72b-instruct128K$0.59/M$0.59/M
qwen/qwen-2.5-7b-instruct32K$0.04/M$0.1/M
google/gemma-3-27b-it53K$0.11/M$0.4/M

Tinfoil (4 models)

Model IDContextPromptCompletion
deepseek/deepseek-r1-0528128K$0.55/M$2.19/M
qwen/qwen3-coder-480b-a35b-instruct131K$0.3/M$1.49/M
qwen/qwen3-vl-30b-a3b-instruct131K$0.15/M$0.6/M
meta-llama/llama-3.3-70b-instruct128K$0.1/M$0.4/M

Near AI (3 models)

Model IDContextPromptCompletion
deepseek/deepseek-chat-v3.1128K$0.28/M$1.14/M
qwen/qwen3-30b-a3b-instruct-2507131K$0.15/M$0.6/M
z-ai/glm-4.6128K$0.5/M$2/M

Learn About Confidential AI

Explore all 14 GPU TEE models in detail →

Vision Models

Models that understand images:
Model IDContextFeaturesTEE Provider
qwen/qwen2.5-vl-72b-instruct128KVision + TextPhala
qwen/qwen3-vl-30b-a3b-instruct131KVision + TextTinfoil

Embedding Models

Generate vector embeddings for semantic search:
Model IDDimensionsMax Tokens
openai/text-embedding-3-large30728191
openai/text-embedding-3-small15368191
openai/text-embedding-ada-00215368191
cohere/embed-english-v3.01024512

Provider Coverage

Supported Providers

  • OpenAI - GPT-5.2, GPT-5.1, GPT-5, GPT-4.1, O3, O4, embeddings
  • Anthropic - Claude Sonnet 4.5, Claude Opus 4.1, Claude 3.7, Claude 3.5 Haiku
  • Google - Gemini 3 Pro Preview, Gemini 2.5 Pro/Flash/Flash-Lite, Gemma 3
  • xAI - Grok 4, Grok 4.1 Fast, Grok Code Fast
  • DeepSeek - DeepSeek V3.2, DeepSeek V3.1, DeepSeek R1
  • Qwen - Qwen3 Coder 480B, Qwen3 VL, Qwen 2.5
  • ZhipuAI - GLM-4.6 bilingual model
  • Meta - Llama 3.3
  • NousResearch - Hermes 3 Llama 405B
  • And more providers

Pricing Tiers

Free Models

Great for testing and low-volume use:
  • qwen/qwen-2.5-7b-instruct - $0.04/M prompt
  • liquid/lfm-40b - Free

Budget Models

Balance of cost and quality:
  • openai/gpt-3.5-turbo - $0.5/M prompt
  • openai/gpt-4o-mini - $0.15/M prompt
  • google/gemini-2.5-flash-lite - $0.1/M prompt
  • qwen/qwen-2.5-7b-instruct - $0.04/M prompt

Premium Models

Best quality for production:
  • openai/gpt-5 - $1.25/M prompt
  • anthropic/claude-sonnet-4.5 - $3/M prompt
  • google/gemini-2.5-pro - $1.25/M prompt
  • phala/deepseek-chat-v3-0324 - $0.28/M prompt (TEE)

Model Selection Guide

By Use Case

Best: anthropic/claude-sonnet-4.5, openai/gpt-5Budget: openai/gpt-5-mini, qwen/qwen-2.5-7b-instruct
Best: openai/gpt-5, anthropic/claude-opus-4.1Budget: qwen/qwen2.5-vl-72b-instruct, google/gemini-2.5-flash
Best: anthropic/claude-sonnet-4.5, google/gemini-2.5-proBudget: google/gemini-2.5-flash, qwen/qwen-2.5-7b-instruct
Best: qwen/qwen3-vl-235b-a22b-instruct, qwen/qwen2.5-vl-72b-instructBudget: phala/qwen2.5-vl-72b-instruct
TEE-Protected: All Phala models
  • phala/deepseek-chat-v3-0324 - Best quality
  • phala/gpt-oss-120b - OpenAI architecture
  • phala/qwen-2.5-7b-instruct - Budget option
Best: google/gemini-2.5-pro (1M tokens), anthropic/claude-sonnet-4.5 (1M tokens)Others: openai/gpt-4.1 (1M tokens), qwen/qwen3-vl-235b-a22b-instruct (131K tokens)

Get Latest Models

Via API

# All models
curl https://api.redpill.ai/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

# Phala confidential models only
curl https://api.redpill.ai/v1/models/phala \
  -H "Authorization: Bearer YOUR_API_KEY"

Via SDK

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.redpill.ai/v1"
)

# List all models
models = client.models.list()
for model in models.data:
    print(f"{model.id}: {model.name}")

Model Properties

Each model includes:
PropertyDescription
idModel identifier for API calls
nameHuman-readable name
context_lengthMaximum tokens in context window
pricing.promptCost per 1K prompt tokens
pricing.completionCost per 1K completion tokens
quantizationModel quantization (e.g., FP8, FP16)
modalityInput/output types (text, image)

Model Compatibility

OpenAI SDK Compatible

All models work with OpenAI SDK:
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.redpill.ai/v1"
)

# Use any model
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.5",  # ✅ Works
    messages=[{"role": "user", "content": "Hello"}]
)

Streaming Support

All chat models support streaming:
stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.5",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

Function Calling

Supported models:
  • All OpenAI GPT models
  • Anthropic Claude 3+ models
  • Google Gemini models
  • Meta Llama 3.2+ models
  • Mistral models

FAQs

Yes! All requests flow through the TEE-protected gateway, regardless of model. For end-to-end TEE protection (including model inference), use Phala confidential models.
If it’s one of the 60+ models (66+ provider integrations available), yes! If not, request it and we’ll consider adding it.
Regular models: TEE-protected gateway onlyPhala models: Full end-to-end TEE (gateway + inference in GPU TEE)
We add new models weekly. Check the API or docs for the latest additions.
Yes! Email [email protected] with your model request.

Next Steps