What is Confidential AI?
Confidential AI refers to AI models that run entirely inside Trusted Execution Environments (TEE), providing end-to-end privacy from input to output. Unlike regular models where only the gateway is TEE-protected, confidential AI models run the entire inference process inside secure enclaves.
RedPill’s Two-Layer TEE Protection
RedPill offers dual privacy protection:Layer 1: TEE-Protected Gateway (All Models)
- ✅ Applies to all 60+ models (66+ provider integrations available)
- ✅ Request processing in TEE
- ✅ Response handling in TEE
- ✅ No additional cost
Layer 2: TEE-Protected Inference (Phala Models)
- ✅ Model weights in GPU TEE
- ✅ Inference computation in TEE
- ✅ Complete end-to-end protection
- ✅ Cryptographic attestation
15 TEE Models
From 3 confidential providers
GPU TEE
NVIDIA H100/H200 secure enclaves
3 Providers
Phala, Tinfoil, Near AI
Verifiable
Cryptographic attestation
GPU TEE Providers
RedPill offers 15 confidential AI models across 3 GPU TEE providers:Phala Network (8 models)
| Model | Parameters | Context | Use Case |
|---|---|---|---|
deepseek/deepseek-v3.2 | 671B (MoE) | 128K | Latest DeepSeek |
deepseek/deepseek-chat-v3-0324 | 685B (MoE) | 163K | Advanced reasoning |
openai/gpt-oss-120b | 117B (MoE) | 131K | OpenAI-compatible |
openai/gpt-oss-20b | 21B (MoE) | 131K | Efficient inference |
qwen/qwen2.5-vl-72b-instruct | 72B | 128K | Vision + language |
qwen/qwen-2.5-7b-instruct | 7B | 32K | Budget-friendly |
google/gemma-3-27b-it | 27B | 53K | Multilingual |
Tinfoil (4 models)
| Model | Parameters | Context | Use Case |
|---|---|---|---|
deepseek/deepseek-r1-0528 | 685B (MoE) | 128K | Reasoning model |
qwen/qwen3-coder-480b-a35b-instruct | 480B (MoE) | 131K | Code generation |
qwen/qwen3-vl-30b-a3b-instruct | 30B (MoE) | 131K | Vision + language |
meta-llama/llama-3.3-70b-instruct | 70B | 128K | General purpose |
Near AI (3 models)
| Model | Parameters | Context | Use Case |
|---|---|---|---|
deepseek/deepseek-chat-v3.1 | 671B (MoE) | 128K | Hybrid reasoning |
qwen/qwen3-30b-a3b-instruct-2507 | 30B (MoE) | 131K | General purpose |
z-ai/glm-4.6 | 130B | 128K | Bilingual (CN/EN) |
All TEE Model Details
Explore all 15 confidential models →
How It Works
1. Model Loading in TEE
Model weights are decrypted only inside the GPU TEE.2. Request Processing
All pink nodes are TEE-protected - your data never leaves hardware security.3. Cryptographic Attestation
Every request generates verifiable proof:- GPU TEE measurements - Proves genuine NVIDIA H100 TEE
- Model hash - Verifies exact model version
- Code hash - Confirms inference code integrity
- Cryptographic signature - Signed by TEE hardware
Verify Attestation
Learn how to verify TEE proofs →
Privacy Guarantees
What CANNOT Be Accessed
Even with full system access, nobody can see:| Data Type | Accessible? | Protection |
|---|---|---|
| Your prompts | ❌ No | GPU TEE encrypted |
| Model responses | ❌ No | GPU TEE encrypted |
| Model weights | ❌ No | Encrypted at rest & in-use |
| Intermediate activations | ❌ No | GPU TEE memory isolation |
| Gradients (fine-tuning) | ❌ No | TEE-protected |
Trust Model
You must trust:- NVIDIA GPU vendor - H100/H200 TEE correctness
- Phala Network - Model deployment integrity
- Open source code - Auditable on GitHub
- ❌ RedPill operators
- ❌ Cloud provider (AWS, GCP, Azure)
- ❌ System administrators
- ❌ Other users on same hardware
Performance
Near-Native Speed
GPU TEE adds minimal overhead:| Metric | Native | TEE Mode | Overhead |
|---|---|---|---|
| Throughput | 100 tok/s | 99 tok/s | ~1% |
| Latency | 50ms | 51ms | ~2% |
| TFLOPS | 1979 | 1959 | ~1% |
Benchmark Results
See detailed performance benchmarks →
Use Cases
Healthcare
Process patient data with HIPAA compliance
Financial Services
Analyze confidential financial data
Legal
Handle privileged communications
Enterprise AI
Protect trade secrets and IP
Government
Classified data processing
Research
Sensitive research data analysis
Example Usage
vs Regular Models
| Feature | Regular Models | Phala Confidential Models |
|---|---|---|
| Gateway TEE | ✅ Yes | ✅ Yes |
| Inference TEE | ❌ No | ✅ Yes |
| Model in TEE | ❌ No | ✅ Yes |
| End-to-end TEE | ❌ No | ✅ Yes |
| Attestation | ✅ Gateway only | ✅ Full stack |
| Model count | 50+ | 7 |
| Price | Provider pricing | Competitive |
Integration with Phala Network
RedPill’s confidential AI is powered by Phala Network, pioneers in:- GPU TEE - First GPU-based confidential computing
- Verifiable AI - Cryptographic proof of execution
- dstack - Open source TEE infrastructure
- Decentralized - Distributed trust model
Phala Documentation
Learn more about Phala’s TEE technology →
Compliance
Confidential AI helps meet regulatory requirements:- HIPAA - Healthcare data protection
- GDPR - European data privacy
- CCPA - California privacy law
- SOC 2 - Security controls
- ISO 27001 - Information security
- FedRAMP - US government (in progress)
FAQs
What's the difference between gateway TEE and confidential AI?
What's the difference between gateway TEE and confidential AI?
- Gateway TEE: Protects request routing (all 60+ models (66+ provider integrations available))
- Confidential AI: Protects entire inference (Phala models only)
Are Phala models slower?
Are Phala models slower?
No! TEE mode runs at 99% of native speed. Performance impact is minimal.
Can I fine-tune Phala models?
Can I fine-tune Phala models?
Custom fine-tuning is available for enterprise customers. Contact [email protected]
How do I verify TEE execution?
How do I verify TEE execution?
Use the attestation API to get cryptographic proof. See Attestation Guide.
Which model should I choose?
Which model should I choose?
- Best quality:
phala/deepseek-chat-v3-0324(685B) - OpenAI-compatible:
phala/gpt-oss-120b(117B) - Advanced vision:
phala/qwen3-vl-235b-a22b-instruct(235B) - Vision + language:
phala/qwen2.5-vl-72b-instruct(72B) - Budget-friendly:
phala/qwen-2.5-7b-instruct(7B)
Can I add custom Phala models?
Can I add custom Phala models?
Yes! Enterprise customers can deploy custom models in GPU TEE. Contact [email protected]