Supported Models - RedPill AI

Overview

RedPill supports 50+ AI models from leading providers, all accessible through a single TEE-protected API. Every request is hardware-protected regardless of which model you choose.

List Models via API

Get the latest models programmatically →

Model Categories

Chat Models

Conversational AI for chatbots and assistants

Instruction Models

Task completion and code generation

Vision Models

Image understanding and analysis

Embedding Models

Text embeddings for search and similarity

Featured Models

OpenAI Models

Model ID	Context	Prompt	Completion
`openai/gpt-5`	400K	$1.25/M	$10/M
`openai/gpt-5-mini`	400K	$0.25/M	$2/M
`openai/gpt-5-nano`	400K	$0.05/M	$0.4/M
`openai/o4-mini`	200K	$1.1/M	$4.4/M
`openai/o3`	200K	$2/M	$8/M
`openai/gpt-4.1`	1M	$2/M	$8/M
`openai/gpt-4.1-mini`	1M	$0.4/M	$1.6/M

Anthropic Models

Model ID	Context	Prompt	Completion
`anthropic/claude-sonnet-4.5`	1M	$3/M	$15/M
`anthropic/claude-opus-4.1`	200K	$15/M	$75/M
`anthropic/claude-opus-4`	200K	$15/M	$75/M
`anthropic/claude-sonnet-4`	1M	$3/M	$15/M
`anthropic/claude-3.7-sonnet`	200K	$3/M	$15/M
`anthropic/claude-3.5-haiku`	200K	$0.8/M	$4/M

Google Models

Model ID	Context	Prompt	Completion
`google/gemini-2.5-pro`	1M	$1.25/M	$10/M
`google/gemini-2.5-flash`	1M	$0.3/M	$2.5/M
`google/gemini-2.5-flash-lite`	1M	$0.1/M	$0.4/M
`google/gemma-3-27b-it`	53K	$0.11/M	$0.4/M

Qwen Models

Model ID	Context	Prompt	Completion
`qwen/qwen2.5-vl-72b-instruct`	64K	$0.59/M	$0.59/M
`qwen/qwen-2.5-7b-instruct`	32K	$0.04/M	$0.1/M
`qwen/qwen3-vl-235b-a22b-instruct`	131K	$0.3/M	$1.49/M

Phala Confidential AI Models

Native TEE models running entirely in GPU secure enclaves:

Model ID	Context	Prompt	Completion	Quantization
`phala/deepseek-chat-v3-0324`	163K	$0.28/M	$1.14/M	FP8
`phala/gemma-3-27b-it`	53K	$0.11/M	$0.4/M	FP8
`phala/gpt-oss-120b`	131K	$0.1/M	$0.49/M	FP8
`phala/gpt-oss-20b`	131K	$0.04/M	$0.15/M	FP8
`phala/qwen-2.5-7b-instruct`	32K	$0.04/M	$0.1/M	FP8
`phala/qwen2.5-vl-72b-instruct`	128K	$0.59/M	$0.59/M	FP8
`phala/qwen3-vl-235b-a22b-instruct`	131K	$0.3/M	$1.49/M	FP8

Learn About Confidential AI

Explore Phala TEE models in detail →

Vision Models

Models that understand images:

Model ID	Context	Features
`phala/qwen2.5-vl-72b-instruct`	128K	TEE-protected vision
`phala/qwen3-vl-235b-a22b-instruct`	131K	TEE-protected vision

Embedding Models

Generate vector embeddings for semantic search:

Model ID	Dimensions	Max Tokens
`openai/text-embedding-3-large`	3072	8191
`openai/text-embedding-3-small`	1536	8191
`openai/text-embedding-ada-002`	1536	8191
`cohere/embed-english-v3.0`	1024	512

Provider Coverage

Supported Providers

OpenAI - GPT-5, GPT-4.1, O3, O4, embeddings
Anthropic - Claude Sonnet 4.5, Claude Opus 4.1, Claude 3.7, Claude 3.5 Haiku
Google - Gemini 2.5 Pro/Flash/Flash-Lite
Qwen - Qwen 2.5, Qwen-VL, Qwen 3
Phala - Confidential AI models in TEE
Cohere - Command, Embed
DeepSeek - Chat, Code models
And more providers

Pricing Tiers

Free Models

Great for testing and low-volume use:

qwen/qwen-2.5-7b-instruct - $0.04/M prompt
liquid/lfm-40b - Free

Budget Models

Balance of cost and quality:

openai/gpt-3.5-turbo - $0.5/M prompt
openai/gpt-4o-mini - $0.15/M prompt
google/gemini-2.5-flash-lite - $0.1/M prompt
qwen/qwen-2.5-7b-instruct - $0.04/M prompt

Premium Models

Best quality for production:

openai/gpt-5 - $1.25/M prompt
anthropic/claude-sonnet-4.5 - $3/M prompt
google/gemini-2.5-pro - $1.25/M prompt
phala/deepseek-chat-v3-0324 - $0.28/M prompt (TEE)

Model Selection Guide

By Use Case

Chatbots & Assistants

Best: anthropic/claude-sonnet-4.5, openai/gpt-5Budget: openai/gpt-5-mini, qwen/qwen-2.5-7b-instruct

Code Generation

Best: openai/gpt-5, anthropic/claude-opus-4.1Budget: qwen/qwen2.5-vl-72b-instruct, google/gemini-2.5-flash

Text Analysis

Best: anthropic/claude-sonnet-4.5, google/gemini-2.5-proBudget: google/gemini-2.5-flash, qwen/qwen-2.5-7b-instruct

Image Understanding

Best: qwen/qwen3-vl-235b-a22b-instruct, qwen/qwen2.5-vl-72b-instructBudget: phala/qwen2.5-vl-72b-instruct

High-Privacy Workloads

TEE-Protected: All Phala models

phala/deepseek-chat-v3-0324 - Best quality
phala/gpt-oss-120b - OpenAI architecture
phala/qwen-2.5-7b-instruct - Budget option

Long Context

Best: google/gemini-2.5-pro (1M tokens), anthropic/claude-sonnet-4.5 (1M tokens)Others: openai/gpt-4.1 (1M tokens), qwen/qwen3-vl-235b-a22b-instruct (131K tokens)

Get Latest Models

Via API

# All models
curl https://api.redpill.ai/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

# Phala confidential models only
curl https://api.redpill.ai/v1/models/phala \
  -H "Authorization: Bearer YOUR_API_KEY"

Via SDK

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.redpill.ai/v1"
)

# List all models
models = client.models.list()
for model in models.data:
    print(f"{model.id}: {model.name}")

Model Properties

Each model includes:

Property	Description
`id`	Model identifier for API calls
`name`	Human-readable name
`context_length`	Maximum tokens in context window
`pricing.prompt`	Cost per 1K prompt tokens
`pricing.completion`	Cost per 1K completion tokens
`quantization`	Model quantization (e.g., FP8, FP16)
`modality`	Input/output types (text, image)

Model Compatibility

OpenAI SDK Compatible

All models work with OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.redpill.ai/v1"
)

# Use any model
response = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.5",  # ✅ Works
    messages=[{"role": "user", "content": "Hello"}]
)

Streaming Support

All chat models support streaming:

stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4.5",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

Function Calling

Supported models:

All OpenAI GPT models
Anthropic Claude 3+ models
Google Gemini models
Meta Llama 3.2+ models
Mistral models

FAQs

Are all 50+ models protected by TEE?

Yes! All requests flow through the TEE-protected gateway, regardless of model. For end-to-end TEE protection (including model inference), use Phala confidential models.

Can I use my favorite model?

If it’s one of the 50+ models, yes! If not, request it and we’ll consider adding it.

What's the difference between regular and Phala models?

Regular models: TEE-protected gateway onlyPhala models: Full end-to-end TEE (gateway + inference in GPU TEE)

How often are new models added?

We add new models weekly. Check the API or docs for the latest additions.

Can I request a specific model?

Yes! Email support@redpill.ai with your model request.

Next Steps

Start Using Models

Make your first request

Confidential AI Models

Explore Phala TEE models

API Reference

Models API endpoint

Pricing Details

Understand pricing

Get Started

Core Concepts

Integrations

Use Cases

Guides

​Overview

List Models via API

​Model Categories

Chat Models

Instruction Models

Vision Models

Embedding Models

​Featured Models

​OpenAI Models

​Anthropic Models

​Google Models

​Qwen Models

​Phala Confidential AI Models

Learn About Confidential AI

​Vision Models

​Embedding Models

​Provider Coverage

​Supported Providers

​Pricing Tiers

​Free Models

​Budget Models

​Premium Models

​Model Selection Guide

​By Use Case

​Get Latest Models

​Via API

​Via SDK

​Model Properties

​Model Compatibility

​OpenAI SDK Compatible

​Streaming Support

​Function Calling

​FAQs

​Next Steps

Start Using Models

Confidential AI Models

API Reference

Pricing Details

Overview

Model Categories

Featured Models

OpenAI Models

Anthropic Models

Google Models

Qwen Models

Phala Confidential AI Models

Vision Models

Embedding Models

Provider Coverage

Supported Providers

Pricing Tiers

Free Models

Budget Models

Premium Models

Model Selection Guide

By Use Case

Get Latest Models

Via API

Via SDK

Model Properties

Model Compatibility

OpenAI SDK Compatible

Streaming Support

Function Calling

FAQs

Next Steps