Available Models
RedPill offers 6 confidential AI models from Phala Network, all running entirely in GPU TEE with FP8 quantization for optimal performance.Model Comparison
Model | Parameters | Context | Modality | Price (Prompt/Completion) |
---|---|---|---|---|
DeepSeek V3 | 685B (MoE) | 164K | Text | 0.00114 per 1K |
GPT-OSS 120B | 117B (MoE) | 131K | Text | 0.00049 per 1K |
GPT-OSS 20B | 21B (MoE) | 131K | Text | 0.0004 per 1K |
Qwen2.5 VL 72B | 72B | 128K | Vision + Text | 0.00059 per 1K |
Qwen 2.5 7B | 7B | 33K | Text | 0.0001 per 1K |
Gemma 3 27B | 27B | 54K | Text | 0.0004 per 1K |
Model Details
phala/deepseek-chat-v3-0324
Best Overall Quality
Flagship model for complex reasoning and analysis
- Parameters: 685 billion (Mixture-of-Experts)
- Context Length: 163,840 tokens (~123K words)
- Quantization: FP8
- Modality: Text → Text
- Complex reasoning and analysis
- Mathematical problem solving
- Code generation and debugging
- Long-form content creation
- Multi-turn conversations
- Financial analysis and modeling
- Legal document review
- Medical diagnosis support
- Research paper analysis
- Advanced code generation
phala/gpt-oss-120b
OpenAI Architecture
OpenAI’s open-weight model with familiar behavior
- Parameters: 117 billion (MoE, 5.1B active)
- Context Length: 131,072 tokens
- Quantization: FP8
- Modality: Text → Text
- Configurable reasoning depth
- Full chain-of-thought access
- Native function calling
- Structured output generation
- AI agents and automation
- Complex task planning
- Tool use and API integration
- Production workloads requiring reasoning
phala/gpt-oss-20b
Efficient & Fast
Smaller model for low-latency applications
- Parameters: 21 billion (MoE, 3.6B active)
- Context Length: 131,072 tokens
- Quantization: FP8
- Modality: Text → Text
- OpenAI Harmony response format
- Reasoning level configuration
- Function calling and tool use
- Structured outputs
- Apache 2.0 license
- Real-time chatbots
- Edge deployment
- Cost-sensitive applications
- High-throughput workloads
phala/qwen2.5-vl-72b-instruct
Vision + Language
Multimodal model for image understanding
- Parameters: 72 billion
- Context Length: 128,000 tokens
- Quantization: FP8
- Modality: Text + Image → Text
- Recognizing common objects (flowers, birds, fish, insects)
- Analyzing texts, charts, icons, graphics
- Understanding layouts within images
- Document understanding
- Visual reasoning
- Medical image analysis
- Document OCR and understanding
- Chart and graph analysis
- Visual quality inspection
- Satellite imagery analysis
phala/qwen-2.5-7b-instruct
Budget-Friendly
Most cost-effective confidential model
- Parameters: 7 billion
- Context Length: 32,768 tokens
- Quantization: FP8
- Modality: Text → Text
- Enhanced coding and mathematics capabilities
- Better instruction following
- Improved long text generation (8K+ tokens)
- Structured data understanding (tables, JSON)
- Multilingual support (29+ languages)
- High-volume applications
- Multilingual support
- Simple chatbots
- Text classification
- Data extraction
phala/gemma-3-27b-it
Google's Latest
Multimodal capabilities with strong multilingual support
- Parameters: 27 billion
- Context Length: 53,920 tokens
- Quantization: FP8
- Modality: Text → Text
- Multimodality support
- Context windows up to 128K tokens
- 140+ language understanding
- Improved math and reasoning
- Structured outputs
- Function calling
- Multilingual applications (140+ languages)
- Math and reasoning tasks
- Structured data generation
- Function calling workflows
- Chat applications
Feature Comparison
Feature | DeepSeek V3 | GPT-OSS 120B | GPT-OSS 20B | Qwen2.5 VL | Qwen 2.5 7B | Gemma 3 27B |
---|---|---|---|---|---|---|
TEE Protected | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Function Calling | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Vision | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ |
Structured Output | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Streaming | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
Multilingual | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ (140+) |
Selection Guide
By Quality Requirements
Highest Quality:phala/deepseek-chat-v3-0324
(685B) - Best overallphala/gpt-oss-120b
(117B) - OpenAI architecturephala/qwen2.5-vl-72b-instruct
(72B) - Vision tasks
phala/gemma-3-27b-it
(27B) - Good quality, reasonable costphala/gpt-oss-20b
(21B) - Fast and efficient
phala/qwen-2.5-7b-instruct
(7B) - Most economical
By Use Case
Complex Reasoning:phala/deepseek-chat-v3-0324
- Best for complex analysisphala/gpt-oss-120b
- OpenAI-style reasoning
phala/qwen2.5-vl-72b-instruct
- Only vision model
phala/gemma-3-27b-it
- 140+ languagesphala/qwen-2.5-7b-instruct
- 29+ languages
phala/qwen-2.5-7b-instruct
- Lowest costphala/gpt-oss-20b
- Fast inference
phala/gpt-oss-120b
- Best for agentsphala/gemma-3-27b-it
- Good function support
Performance Benchmarks
All models run at ~99% of native performance in TEE mode:Model | Native Speed | TEE Speed | Overhead |
---|---|---|---|
DeepSeek V3 | 85 tok/s | 84 tok/s | ~1% |
GPT-OSS 120B | 95 tok/s | 94 tok/s | ~1% |
GPT-OSS 20B | 120 tok/s | 118 tok/s | ~2% |
Qwen2.5 VL 72B | 75 tok/s | 74 tok/s | ~1% |
Qwen 2.5 7B | 150 tok/s | 148 tok/s | ~1% |
Gemma 3 27B | 100 tok/s | 99 tok/s | ~1% |
Attestation Support
All models provide cryptographic attestation:Attestation Guide
Learn how to verify TEE execution →
Pricing Comparison
Model | Cost per 1M Tokens | Quality/$ Ratio |
---|---|---|
Qwen 2.5 7B | $40 (prompt) | ⭐⭐⭐⭐ Excellent |
GPT-OSS 20B | $100 (prompt) | ⭐⭐⭐⭐ Excellent |
Gemma 3 27B | $110 (prompt) | ⭐⭐⭐ Good |
DeepSeek V3 | $490 (prompt) | ⭐⭐⭐⭐⭐ Best |
Qwen2.5 VL 72B | $590 (prompt) | ⭐⭐⭐⭐ Vision |
GPT-OSS 120B | $100 (prompt) | ⭐⭐⭐⭐⭐ Excellent |
Migration Guide
From Regular Models to Phala
Simply change the model name:FAQs
Which model is most similar to GPT-4?
Which model is most similar to GPT-4?
phala/gpt-oss-120b
- It’s OpenAI’s architecture and has similar capabilities.Which model is fastest?
Which model is fastest?
phala/qwen-2.5-7b-instruct
(150 tok/s) - Smallest and fastest.Which model supports images?
Which model supports images?
phala/qwen2.5-vl-72b-instruct
- The only vision model currently.Are these models as good as GPT-4?
Are these models as good as GPT-4?
phala/deepseek-chat-v3-0324
matches or exceeds GPT-4 on many benchmarks, with full TEE protection.Can I fine-tune these models?
Can I fine-tune these models?
Enterprise customers can fine-tune models in TEE. Contact sales@redpill.ai
What's FP8 quantization?
What's FP8 quantization?
FP8 reduces model size and increases speed with minimal quality loss (~1%). Enables efficient TEE inference.