Phala Confidential Models

Available Models

RedPill offers 7 confidential AI models from Phala Network, all running entirely in GPU TEE with FP8 quantization for optimal performance.

Model Comparison

Model	Parameters	Context	Modality	Price (Prompt/Completion)
DeepSeek V3	685B (MoE)	163K	Text	$0.28 /$ 1.14 per M
GPT-OSS 120B	117B (MoE)	131K	Text	$0.1 /$ 0.49 per M
GPT-OSS 20B	21B (MoE)	131K	Text	$0.04 /$ 0.15 per M
Qwen2.5 VL 72B	72B	128K	Vision + Text	$0.59 /$ 0.59 per M
Qwen3 VL 235B	235B	131K	Vision + Text	$0.3 /$ 1.49 per M
Qwen 2.5 7B	7B	32K	Text	$0.04 /$ 0.1 per M
Gemma 3 27B	27B	53K	Text	$0.11 /$ 0.4 per M

Model Details

phala/deepseek-chat-v3-0324

Best Overall Quality

Flagship model for complex reasoning and analysis

Specifications:

Parameters: 685 billion (Mixture-of-Experts)
Context Length: 163,840 tokens (163K)
Quantization: FP8
Modality: Text → Text

Description: DeepSeek V3 is a 685B-parameter mixture-of-experts model, the flagship of the DeepSeek family. It excels at:

Complex reasoning and analysis
Mathematical problem solving
Code generation and debugging
Long-form content creation
Multi-turn conversations

Use Cases:

Financial analysis and modeling
Legal document review
Medical diagnosis support
Research paper analysis
Advanced code generation

Example:

response = client.chat.completions.create(
    model="phala/deepseek-chat-v3-0324",
    messages=[{
        "role": "user",
        "content": "Analyze the legal implications of this contract clause: ..."
    }]
)

phala/gpt-oss-120b

OpenAI Architecture

OpenAI’s open-weight model with familiar behavior

Specifications:

Parameters: 117 billion (MoE, 5.1B active)
Context Length: 131,072 tokens
Quantization: FP8
Modality: Text → Text

Description: GPT-OSS-120B is OpenAI’s open-weight model designed for high-reasoning and agentic use cases. Optimized for single H100 GPU with:

Configurable reasoning depth
Full chain-of-thought access
Native function calling
Structured output generation

Use Cases:

AI agents and automation
Complex task planning
Tool use and API integration
Production workloads requiring reasoning

Example:

response = client.chat.completions.create(
    model="phala/gpt-oss-120b",
    messages=[{
        "role": "user",
        "content": "Create a step-by-step plan to migrate our infrastructure to TEE"
    }],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_infrastructure_status",
            "description": "Get current infrastructure state"
        }
    }]
)

phala/gpt-oss-20b

Efficient & Fast

Smaller model for low-latency applications

Specifications:

Parameters: 21 billion (MoE, 3.6B active)
Context Length: 131,072 tokens
Quantization: FP8
Modality: Text → Text

Description: GPT-OSS-20B is optimized for lower-latency inference and consumer/single-GPU deployment. Features:

OpenAI Harmony response format
Reasoning level configuration
Function calling and tool use
Structured outputs
Apache 2.0 license

Use Cases:

Real-time chatbots
Edge deployment
Cost-sensitive applications
High-throughput workloads

Example:

response = client.chat.completions.create(
    model="phala/gpt-oss-20b",
    messages=[{
        "role": "user",
        "content": "Summarize this customer support ticket"
    }],
    max_tokens=150
)

phala/qwen2.5-vl-72b-instruct

Vision + Language

Multimodal model for image understanding

Specifications:

Parameters: 72 billion
Context Length: 128,000 tokens
Quantization: FP8
Modality: Text + Image → Text

Description: Qwen2.5-VL is proficient in:

Recognizing common objects (flowers, birds, fish, insects)
Analyzing texts, charts, icons, graphics
Understanding layouts within images
Document understanding
Visual reasoning

Use Cases:

Medical image analysis
Document OCR and understanding
Chart and graph analysis
Visual quality inspection
Satellite imagery analysis

Example:

response = client.chat.completions.create(
    model="phala/qwen2.5-vl-72b-instruct",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Analyze this medical X-ray for potential issues"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/xray.jpg"
                }
            }
        ]
    }]
)

phala/qwen3-vl-235b-a22b-instruct

Advanced Vision Model

State-of-the-art multimodal reasoning and vision understanding

Specifications:

Parameters: 235 billion
Context Length: 131,072 tokens (131K)
Quantization: FP8
Modality: Text + Image → Text

Description: Qwen3 VL is a cutting-edge multimodal model with advanced vision capabilities:

Advanced visual reasoning and understanding
Complex scene analysis and interpretation
Technical diagram and blueprint understanding
Scientific paper analysis with figures
Chart, graph and table comprehension
Multi-image understanding and comparison

Use Cases:

Scientific paper analysis with figures
Technical documentation review
Complex diagram interpretation
Advanced medical image analysis
Architectural and design review
Research data visualization analysis

Example:

response = client.chat.completions.create(
    model="phala/qwen3-vl-235b-a22b-instruct",
    messages=[{
        "role": "user",
        "content": [
            {
                "type": "text",
                "text": "Analyze these research paper figures and explain the key findings"
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/figure1.png"
                }
            },
            {
                "type": "image_url",
                "image_url": {
                    "url": "https://example.com/figure2.png"
                }
            }
        ]
    }]
)

phala/qwen-2.5-7b-instruct

Budget-Friendly

Most cost-effective confidential model

Specifications:

Parameters: 7 billion
Context Length: 32,768 tokens (32K)
Quantization: FP8
Modality: Text → Text

Description: Qwen 2.5 7B brings significant improvements:

Enhanced coding and mathematics capabilities
Better instruction following
Improved long text generation (8K+ tokens)
Structured data understanding (tables, JSON)
Multilingual support (29+ languages)

Use Cases:

High-volume applications
Multilingual support
Simple chatbots
Text classification
Data extraction

Supported Languages: Chinese, English, French, Spanish, Portuguese, German, Italian, Russian, Japanese, Korean, Vietnamese, Thai, Arabic, and more. Example:

response = client.chat.completions.create(
    model="phala/qwen-2.5-7b-instruct",
    messages=[{
        "role": "user",
        "content": "Extract key information from this invoice (JSON format)"
    }],
    response_format={"type": "json_object"}
)

phala/gemma-3-27b-it

Google's Latest

Multimodal capabilities with strong multilingual support

Specifications:

Parameters: 27 billion
Context Length: 53,920 tokens (53K)
Quantization: FP8
Modality: Text → Text

Description: Gemma 3 introduces:

Multimodality support
Context windows up to 128K tokens
140+ language understanding
Improved math and reasoning
Structured outputs
Function calling

Use Cases:

Multilingual applications (140+ languages)
Math and reasoning tasks
Structured data generation
Function calling workflows
Chat applications

Example:

response = client.chat.completions.create(
    model="phala/gemma-3-27b-it",
    messages=[{
        "role": "user",
        "content": "Solve this calculus problem step by step"
    }]
)

Feature Comparison

Feature	DeepSeek V3	GPT-OSS 120B	GPT-OSS 20B	Qwen2.5 VL	Qwen3 VL	Qwen 2.5 7B	Gemma 3 27B
TEE Protected	✅	✅	✅	✅	✅	✅	✅
Function Calling	✅	✅	✅	✅	✅	✅	✅
Vision	❌	❌	❌	✅	✅	❌	❌
Structured Output	✅	✅	✅	✅	✅	✅	✅
Streaming	✅	✅	✅	✅	✅	✅	✅
Multilingual	✅	✅	✅	✅	✅	✅	✅ (140+)

Selection Guide

By Quality Requirements

Highest Quality:

phala/deepseek-chat-v3-0324 (685B) - Best overall
phala/qwen3-vl-235b-a22b-instruct (235B) - Advanced vision
phala/gpt-oss-120b (117B) - OpenAI architecture

Vision + Language:

phala/qwen3-vl-235b-a22b-instruct (235B) - Advanced vision
phala/qwen2.5-vl-72b-instruct (72B) - Standard vision

Balanced:

phala/gemma-3-27b-it (27B) - Good quality, reasonable cost
phala/gpt-oss-20b (21B) - Fast and efficient

Budget:

phala/qwen-2.5-7b-instruct (7B) - Most economical

By Use Case

Complex Reasoning:

phala/deepseek-chat-v3-0324 - Best for complex analysis
phala/gpt-oss-120b - OpenAI-style reasoning

Advanced Vision Tasks:

phala/qwen3-vl-235b-a22b-instruct - Scientific/technical documents
phala/qwen2.5-vl-72b-instruct - General vision tasks

Multilingual:

phala/gemma-3-27b-it - 140+ languages
phala/qwen-2.5-7b-instruct - 29+ languages

High Volume:

phala/qwen-2.5-7b-instruct - Lowest cost
phala/gpt-oss-20b - Fast inference

Function Calling:

phala/gpt-oss-120b - Best for agents
phala/gemma-3-27b-it - Good function support

Performance Benchmarks

All models run at ~99% of native performance in TEE mode:

Model	Native Speed	TEE Speed	Overhead
DeepSeek V3	85 tok/s	84 tok/s	~1%
GPT-OSS 120B	95 tok/s	94 tok/s	~1%
GPT-OSS 20B	120 tok/s	118 tok/s	~2%
Qwen3 VL 235B	45 tok/s	44 tok/s	~1%
Qwen2.5 VL 72B	75 tok/s	74 tok/s	~1%
Qwen 2.5 7B	150 tok/s	148 tok/s	~1%
Gemma 3 27B	100 tok/s	99 tok/s	~1%

Attestation Support

All models provide cryptographic attestation:

# Get attestation for any Phala model
curl "https://api.redpill.ai/v1/attestation/report?model=phala/deepseek-chat-v3-0324" \
  -H "Authorization: Bearer YOUR_API_KEY"

Attestation Guide

Learn how to verify TEE execution →

Pricing Comparison

Model	Cost per M Tokens	Quality/$ Ratio
Qwen 2.5 7B	$0.04 (prompt)	⭐⭐⭐⭐ Excellent
GPT-OSS 20B	$0.04 (prompt)	⭐⭐⭐⭐ Excellent
GPT-OSS 120B	$0.1 (prompt)	⭐⭐⭐⭐⭐ Best
Gemma 3 27B	$0.11 (prompt)	⭐⭐⭐ Good
DeepSeek V3	$0.28 (prompt)	⭐⭐⭐⭐⭐ Best
Qwen3 VL 235B	$0.3 (prompt)	⭐⭐⭐⭐⭐ Advanced Vision
Qwen2.5 VL 72B	$0.59 (prompt)	⭐⭐⭐⭐ Vision

Migration Guide

From Regular Models to Phala

Simply change the model name:

# Before (regular model)
response = client.chat.completions.create(
    model="openai/gpt-5",
    messages=[...]
)

# After (Phala confidential model)
response = client.chat.completions.create(
    model="phala/gpt-oss-120b",  # Similar to GPT-4
    messages=[...]  # Same API!
)

No other code changes required!

FAQs

Which model is most similar to GPT-4?

phala/gpt-oss-120b - It’s OpenAI’s architecture and has similar capabilities.

Which model is fastest?

phala/qwen-2.5-7b-instruct (150 tok/s) - Smallest and fastest.

Which model supports images?

phala/qwen3-vl-235b-a22b-instruct - Advanced vision (235B)
phala/qwen2.5-vl-72b-instruct - Standard vision (72B)

Are these models as good as GPT-4?

phala/deepseek-chat-v3-0324 matches or exceeds GPT-4 on many benchmarks, with full TEE protection.

Can I fine-tune these models?

Enterprise customers can fine-tune models in TEE. Contact sales@redpill.ai

What's FP8 quantization?

FP8 reduces model size and increases speed with minimal quality loss (~1%). Enables efficient TEE inference.

Next Steps

Start Using Models

Make your first request

Verify Attestation

Cryptographic proof of TEE

API Reference

Complete API documentation

Pricing Details

Compare model costs

Confidential AI

​Available Models

​Model Comparison

​Model Details

​phala/deepseek-chat-v3-0324

Best Overall Quality

​phala/gpt-oss-120b

OpenAI Architecture

​phala/gpt-oss-20b

Efficient & Fast

​phala/qwen2.5-vl-72b-instruct

Vision + Language

​phala/qwen3-vl-235b-a22b-instruct

Advanced Vision Model

​phala/qwen-2.5-7b-instruct

Budget-Friendly

​phala/gemma-3-27b-it

Google's Latest

​Feature Comparison

​Selection Guide

​By Quality Requirements

​By Use Case

​Performance Benchmarks

​Attestation Support

Attestation Guide

​Pricing Comparison

​Migration Guide

​From Regular Models to Phala

​FAQs

​Next Steps

Start Using Models

Verify Attestation

API Reference

Pricing Details

Available Models

Model Comparison

Model Details

phala/deepseek-chat-v3-0324

phala/gpt-oss-120b

phala/gpt-oss-20b

phala/qwen2.5-vl-72b-instruct

phala/qwen3-vl-235b-a22b-instruct

phala/qwen-2.5-7b-instruct

phala/gemma-3-27b-it

Feature Comparison

Selection Guide

By Quality Requirements

By Use Case

Performance Benchmarks

Attestation Support

Pricing Comparison

Migration Guide

From Regular Models to Phala

FAQs

Next Steps