Available Models
RedPill offers 7 confidential AI models from Phala Network, all running entirely in GPU TEE with FP8 quantization for optimal performance.Model Comparison
| Model | Parameters | Context | Modality | Price (Prompt/Completion) |
|---|---|---|---|---|
| DeepSeek V3 | 685B (MoE) | 163K | Text | 1.14 per M |
| GPT-OSS 120B | 117B (MoE) | 131K | Text | 0.49 per M |
| GPT-OSS 20B | 21B (MoE) | 131K | Text | 0.15 per M |
| Qwen2.5 VL 72B | 72B | 128K | Vision + Text | 0.59 per M |
| Qwen3 VL 235B | 235B | 131K | Vision + Text | 1.49 per M |
| Qwen 2.5 7B | 7B | 32K | Text | 0.1 per M |
| Gemma 3 27B | 27B | 53K | Text | 0.4 per M |
Model Details
phala/deepseek-chat-v3-0324
Best Overall Quality
Flagship model for complex reasoning and analysis
- Parameters: 685 billion (Mixture-of-Experts)
- Context Length: 163,840 tokens (163K)
- Quantization: FP8
- Modality: Text → Text
- Complex reasoning and analysis
- Mathematical problem solving
- Code generation and debugging
- Long-form content creation
- Multi-turn conversations
- Financial analysis and modeling
- Legal document review
- Medical diagnosis support
- Research paper analysis
- Advanced code generation
phala/gpt-oss-120b
OpenAI Architecture
OpenAI’s open-weight model with familiar behavior
- Parameters: 117 billion (MoE, 5.1B active)
- Context Length: 131,072 tokens
- Quantization: FP8
- Modality: Text → Text
- Configurable reasoning depth
- Full chain-of-thought access
- Native function calling
- Structured output generation
- AI agents and automation
- Complex task planning
- Tool use and API integration
- Production workloads requiring reasoning
phala/gpt-oss-20b
Efficient & Fast
Smaller model for low-latency applications
- Parameters: 21 billion (MoE, 3.6B active)
- Context Length: 131,072 tokens
- Quantization: FP8
- Modality: Text → Text
- OpenAI Harmony response format
- Reasoning level configuration
- Function calling and tool use
- Structured outputs
- Apache 2.0 license
- Real-time chatbots
- Edge deployment
- Cost-sensitive applications
- High-throughput workloads
phala/qwen2.5-vl-72b-instruct
Vision + Language
Multimodal model for image understanding
- Parameters: 72 billion
- Context Length: 128,000 tokens
- Quantization: FP8
- Modality: Text + Image → Text
- Recognizing common objects (flowers, birds, fish, insects)
- Analyzing texts, charts, icons, graphics
- Understanding layouts within images
- Document understanding
- Visual reasoning
- Medical image analysis
- Document OCR and understanding
- Chart and graph analysis
- Visual quality inspection
- Satellite imagery analysis
phala/qwen3-vl-235b-a22b-instruct
Advanced Vision Model
State-of-the-art multimodal reasoning and vision understanding
- Parameters: 235 billion
- Context Length: 131,072 tokens (131K)
- Quantization: FP8
- Modality: Text + Image → Text
- Advanced visual reasoning and understanding
- Complex scene analysis and interpretation
- Technical diagram and blueprint understanding
- Scientific paper analysis with figures
- Chart, graph and table comprehension
- Multi-image understanding and comparison
- Scientific paper analysis with figures
- Technical documentation review
- Complex diagram interpretation
- Advanced medical image analysis
- Architectural and design review
- Research data visualization analysis
phala/qwen-2.5-7b-instruct
Budget-Friendly
Most cost-effective confidential model
- Parameters: 7 billion
- Context Length: 32,768 tokens (32K)
- Quantization: FP8
- Modality: Text → Text
- Enhanced coding and mathematics capabilities
- Better instruction following
- Improved long text generation (8K+ tokens)
- Structured data understanding (tables, JSON)
- Multilingual support (29+ languages)
- High-volume applications
- Multilingual support
- Simple chatbots
- Text classification
- Data extraction
phala/gemma-3-27b-it
Google's Latest
Multimodal capabilities with strong multilingual support
- Parameters: 27 billion
- Context Length: 53,920 tokens (53K)
- Quantization: FP8
- Modality: Text → Text
- Multimodality support
- Context windows up to 128K tokens
- 140+ language understanding
- Improved math and reasoning
- Structured outputs
- Function calling
- Multilingual applications (140+ languages)
- Math and reasoning tasks
- Structured data generation
- Function calling workflows
- Chat applications
Feature Comparison
| Feature | DeepSeek V3 | GPT-OSS 120B | GPT-OSS 20B | Qwen2.5 VL | Qwen3 VL | Qwen 2.5 7B | Gemma 3 27B |
|---|---|---|---|---|---|---|---|
| TEE Protected | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Function Calling | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Vision | ❌ | ❌ | ❌ | ✅ | ✅ | ❌ | ❌ |
| Structured Output | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Streaming | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
| Multilingual | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ (140+) |
Selection Guide
By Quality Requirements
Highest Quality:phala/deepseek-chat-v3-0324(685B) - Best overallphala/qwen3-vl-235b-a22b-instruct(235B) - Advanced visionphala/gpt-oss-120b(117B) - OpenAI architecture
phala/qwen3-vl-235b-a22b-instruct(235B) - Advanced visionphala/qwen2.5-vl-72b-instruct(72B) - Standard vision
phala/gemma-3-27b-it(27B) - Good quality, reasonable costphala/gpt-oss-20b(21B) - Fast and efficient
phala/qwen-2.5-7b-instruct(7B) - Most economical
By Use Case
Complex Reasoning:phala/deepseek-chat-v3-0324- Best for complex analysisphala/gpt-oss-120b- OpenAI-style reasoning
phala/qwen3-vl-235b-a22b-instruct- Scientific/technical documentsphala/qwen2.5-vl-72b-instruct- General vision tasks
phala/gemma-3-27b-it- 140+ languagesphala/qwen-2.5-7b-instruct- 29+ languages
phala/qwen-2.5-7b-instruct- Lowest costphala/gpt-oss-20b- Fast inference
phala/gpt-oss-120b- Best for agentsphala/gemma-3-27b-it- Good function support
Performance Benchmarks
All models run at ~99% of native performance in TEE mode:| Model | Native Speed | TEE Speed | Overhead |
|---|---|---|---|
| DeepSeek V3 | 85 tok/s | 84 tok/s | ~1% |
| GPT-OSS 120B | 95 tok/s | 94 tok/s | ~1% |
| GPT-OSS 20B | 120 tok/s | 118 tok/s | ~2% |
| Qwen3 VL 235B | 45 tok/s | 44 tok/s | ~1% |
| Qwen2.5 VL 72B | 75 tok/s | 74 tok/s | ~1% |
| Qwen 2.5 7B | 150 tok/s | 148 tok/s | ~1% |
| Gemma 3 27B | 100 tok/s | 99 tok/s | ~1% |
Attestation Support
All models provide cryptographic attestation:Attestation Guide
Learn how to verify TEE execution →
Pricing Comparison
| Model | Cost per M Tokens | Quality/$ Ratio |
|---|---|---|
| Qwen 2.5 7B | $0.04 (prompt) | ⭐⭐⭐⭐ Excellent |
| GPT-OSS 20B | $0.04 (prompt) | ⭐⭐⭐⭐ Excellent |
| GPT-OSS 120B | $0.1 (prompt) | ⭐⭐⭐⭐⭐ Best |
| Gemma 3 27B | $0.11 (prompt) | ⭐⭐⭐ Good |
| DeepSeek V3 | $0.28 (prompt) | ⭐⭐⭐⭐⭐ Best |
| Qwen3 VL 235B | $0.3 (prompt) | ⭐⭐⭐⭐⭐ Advanced Vision |
| Qwen2.5 VL 72B | $0.59 (prompt) | ⭐⭐⭐⭐ Vision |
Migration Guide
From Regular Models to Phala
Simply change the model name:FAQs
Which model is most similar to GPT-4?
Which model is most similar to GPT-4?
phala/gpt-oss-120b - It’s OpenAI’s architecture and has similar capabilities.Which model is fastest?
Which model is fastest?
phala/qwen-2.5-7b-instruct (150 tok/s) - Smallest and fastest.Which model supports images?
Which model supports images?
phala/qwen3-vl-235b-a22b-instruct- Advanced vision (235B)phala/qwen2.5-vl-72b-instruct- Standard vision (72B)
Are these models as good as GPT-4?
Are these models as good as GPT-4?
phala/deepseek-chat-v3-0324 matches or exceeds GPT-4 on many benchmarks, with full TEE protection.Can I fine-tune these models?
Can I fine-tune these models?
Enterprise customers can fine-tune models in TEE. Contact sales@redpill.ai
What's FP8 quantization?
What's FP8 quantization?
FP8 reduces model size and increases speed with minimal quality loss (~1%). Enables efficient TEE inference.