Overview

RedPill supports 218+ AI models from leading providers, all accessible through a single TEE-protected API. Every request is hardware-protected regardless of which model you choose.

List Models via API

Get the latest models programmatically →

Model Categories

Chat Models

Conversational AI for chatbots and assistants

Instruction Models

Task completion and code generation

Vision Models

Image understanding and analysis

Embedding Models

Text embeddings for search and similarity

OpenAI Models

Model IDContextPromptCompletion
openai/gpt-4-turbo128K$0.01/1K$0.03/1K
openai/gpt-48K$0.03/1K$0.06/1K
openai/gpt-3.5-turbo16K$0.0005/1K$0.0015/1K
openai/o1-preview128K$0.015/1K$0.06/1K
openai/o1-mini128K$0.003/1K$0.012/1K

Anthropic Models

Model IDContextPromptCompletion
anthropic/claude-3.5-sonnet200K$0.003/1K$0.015/1K
anthropic/claude-3-opus200K$0.015/1K$0.075/1K
anthropic/claude-3-haiku200K$0.00025/1K$0.00125/1K

Google Models

Model IDContextPromptCompletion
google/gemini-1.5-pro2M$0.00125/1K$0.005/1K
google/gemini-1.5-flash1M$0.000075/1K$0.0003/1K
google/gemini-flash-1.5-8b1M$0.0000375/1K$0.00015/1K

Meta Llama Models

Model IDContextPromptCompletion
meta-llama/llama-3.3-70b-instruct131K$0.00035/1K$0.0004/1K
meta-llama/llama-3.2-90b-vision-instruct131K$0.00035/1K$0.0004/1K
meta-llama/llama-3.2-11b-vision-instruct131K$0.000055/1K$0.000055/1K
meta-llama/llama-3.2-3b-instruct131K$0.00003/1K$0.00005/1K

Mistral AI Models

Model IDContextPromptCompletion
mistralai/mistral-large-latest128K$0.002/1K$0.006/1K
mistralai/mixtral-8x22b-instruct64K$0.00065/1K$0.00065/1K
mistralai/ministral-8b128K$0.0000001/1K$0.0000001/1K
mistralai/ministral-3b128K$0.00000004/1K$0.00000004/1K

Qwen Models

Model IDContextPromptCompletion
qwen/qwen-2.5-72b-instruct131K$0.00035/1K$0.0004/1K
qwen/qwen-2.5-7b-instruct131K$0.00027/1K$0.00027/1K
qwen/qwen-2-vl-72b-instruct33K$0.0004/1K$0.0004/1K

Phala Confidential AI Models

Native TEE models running entirely in GPU secure enclaves:
Model IDContextPromptCompletionQuantization
phala/deepseek-chat-v3-0324164K$0.00049/1K$0.00114/1KFP8
phala/gpt-oss-120b131K$0.0001/1K$0.00049/1KFP8
phala/gpt-oss-20b131K$0.0001/1K$0.0004/1KFP8
phala/qwen2.5-vl-72b-instruct128K$0.00059/1K$0.00059/1KFP8
phala/qwen-2.5-7b-instruct33K$0.00004/1K$0.0001/1KFP8
phala/gemma-3-27b-it54K$0.00011/1K$0.0004/1KFP8

Learn About Confidential AI

Explore Phala TEE models in detail →

Vision Models

Models that understand images:
Model IDContextFeatures
meta-llama/llama-3.2-90b-vision-instruct131KHigh-quality vision
meta-llama/llama-3.2-11b-vision-instruct131KEfficient vision
qwen/qwen-2-vl-72b-instruct33KChinese + English
phala/qwen2.5-vl-72b-instruct128KTEE-protected vision
google/gemini-1.5-pro2MLong context vision

Embedding Models

Generate vector embeddings for semantic search:
Model IDDimensionsMax Tokens
openai/text-embedding-3-large30728191
openai/text-embedding-3-small15368191
openai/text-embedding-ada-00215368191
cohere/embed-english-v3.01024512

Provider Coverage

Supported Providers

  • OpenAI - GPT-4, GPT-3.5, o1, o3, embeddings
  • Anthropic - Claude 3.5, Claude 3, Claude 2
  • Google - Gemini 1.5 Pro/Flash, PaLM
  • Meta - Llama 3.3, Llama 3.2, Llama 3.1
  • Mistral - Large, Medium, Small, Mixtral
  • Qwen - Qwen 2.5, Qwen-VL
  • Phala - Confidential AI models in TEE
  • Cohere - Command, Embed
  • Perplexity - Sonar models
  • NVIDIA - Nemotron models
  • DeepSeek - Chat, Code models
  • And 60+ more providers

Pricing Tiers

Free Models

Great for testing and low-volume use:
  • mistralai/ministral-3b - $0.00000004/1K tokens
  • mistralai/ministral-8b - $0.0000001/1K tokens
  • meta-llama/llama-3.2-1b-instruct - $0.00000001/1K prompt
  • liquid/lfm-40b - Free

Budget Models

Balance of cost and quality:
  • openai/gpt-3.5-turbo - $0.0005/1K prompt
  • anthropic/claude-3-haiku - $0.00025/1K prompt
  • google/gemini-flash-1.5-8b - $0.0000375/1K prompt
  • meta-llama/llama-3.2-3b-instruct - $0.00003/1K prompt

Premium Models

Best quality for production:
  • openai/gpt-4-turbo - $0.01/1K prompt
  • anthropic/claude-3.5-sonnet - $0.003/1K prompt
  • google/gemini-1.5-pro - $0.00125/1K prompt
  • phala/deepseek-chat-v3-0324 - $0.00049/1K prompt (TEE)

Model Selection Guide

By Use Case

Best: anthropic/claude-3.5-sonnet, openai/gpt-4-turboBudget: openai/gpt-3.5-turbo, meta-llama/llama-3.3-70b-instruct
Best: openai/gpt-4-turbo, anthropic/claude-3-opusBudget: meta-llama/llama-3.3-70b-instruct, qwen/qwen-2.5-72b-instruct
Best: anthropic/claude-3.5-sonnet, google/gemini-1.5-proBudget: google/gemini-1.5-flash, qwen/qwen-2.5-7b-instruct
Best: meta-llama/llama-3.2-90b-vision-instruct, google/gemini-1.5-proBudget: meta-llama/llama-3.2-11b-vision-instruct
TEE-Protected: All Phala models
  • phala/deepseek-chat-v3-0324 - Best quality
  • phala/gpt-oss-120b - OpenAI architecture
  • phala/qwen-2.5-7b-instruct - Budget option
Best: google/gemini-1.5-pro (2M tokens), google/gemini-1.5-flash (1M tokens)Others: anthropic/claude-3.5-sonnet (200K), qwen/qwen-2.5-72b-instruct (131K)

Get Latest Models

Via API

# All models
curl https://api.redpill.ai/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

# Phala confidential models only
curl https://api.redpill.ai/v1/models/phala \
  -H "Authorization: Bearer YOUR_API_KEY"

Via SDK

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.redpill.ai/v1"
)

# List all models
models = client.models.list()
for model in models.data:
    print(f"{model.id}: {model.name}")

Model Properties

Each model includes:
PropertyDescription
idModel identifier for API calls
nameHuman-readable name
context_lengthMaximum tokens in context window
pricing.promptCost per 1K prompt tokens
pricing.completionCost per 1K completion tokens
quantizationModel quantization (e.g., FP8, FP16)
modalityInput/output types (text, image)

Model Compatibility

OpenAI SDK Compatible

All models work with OpenAI SDK:
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.redpill.ai/v1"
)

# Use any model
response = client.chat.completions.create(
    model="anthropic/claude-3.5-sonnet",  # ✅ Works
    messages=[{"role": "user", "content": "Hello"}]
)

Streaming Support

All chat models support streaming:
stream = client.chat.completions.create(
    model="anthropic/claude-sonnet-4",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True
)

Function Calling

Supported models:
  • All OpenAI GPT models
  • Anthropic Claude 3+ models
  • Google Gemini models
  • Meta Llama 3.2+ models
  • Mistral models

FAQs

Yes! All requests flow through the TEE-protected gateway, regardless of model. For end-to-end TEE protection (including model inference), use Phala confidential models.
If it’s one of the 218+ models, yes! If not, request it and we’ll consider adding it.
Regular models: TEE-protected gateway onlyPhala models: Full end-to-end TEE (gateway + inference in GPU TEE)
We add new models weekly. Check the API or docs for the latest additions.
Yes! Email support@redpill.ai with your model request.

Next Steps