Overview
RedPill’s TEE-protected gateway delivers near-native performance with hardware-enforced privacy. Our benchmarks demonstrate minimal overhead while providing cryptographic guarantees.GPU TEE Performance
NVIDIA H100 GPU TEE Efficiency
Phala’s confidential AI models running in NVIDIA H100 GPU TEE achieve 99% efficiency compared to non-TEE execution:Metric | Non-TEE | GPU TEE | Efficiency |
---|---|---|---|
Throughput (tokens/sec) | 1000 | 990 | 99% |
Latency P50 (ms) | 100 | 101 | 99% |
Latency P95 (ms) | 150 | 152 | 98.7% |
Memory Overhead | - | <2% | - |
Research Source: Confidential AI Benchmark Paper (arXiv)NVIDIA H100 GPU TEE provides hardware isolation with minimal performance impact.
RedPill Gateway Performance
Multi-Provider Routing Latency
Added latency from RedPill’s TEE-protected gateway:Provider | Direct Latency | Via RedPill | Overhead |
---|---|---|---|
OpenAI GPT-4o | 250ms | 255ms | +5ms |
Claude 3.5 Sonnet | 300ms | 306ms | +6ms |
DeepSeek Chat | 180ms | 185ms | +5ms |
Phala Qwen 2.5 | 120ms | 121ms | +1ms |
Why so fast?
- Hardware acceleration via Intel SGX/TDX
- Optimized request routing in TEE
- Minimal cryptographic overhead
- Direct encrypted passthrough
Attestation Verification
Operation | Time | Notes |
---|---|---|
Generate Attestation | <50ms | Per request |
Verify Signature | <10ms | Client-side |
Full Chain Verification | <100ms | Intel/NVIDIA roots |
Confidential Model Benchmarks
Phala Confidential Models Performance
All models running in GPU TEE with cryptographic attestation:Model | Context Length | Tokens/sec | Latency P50 | TEE Overhead |
---|---|---|---|---|
phala/qwen-2.5-7b-instruct | 32K | 850 | 95ms | <1% |
phala/deepseek-chat-v3-0324 | 64K | 920 | 110ms | <1% |
phala/gpt-oss-120b | 8K | 680 | 145ms | <2% |
phala/gemma-2-27b-it | 8K | 720 | 125ms | <1.5% |
phala/llama-3.3-70b | 128K | 580 | 175ms | <2% |
phala/qwen-qwq-32b | 32K | 650 | 140ms | <1.5% |
Test These Models
Try confidential models via API →
Throughput Comparison
Requests Per Second (RPS)
Single instance capacity:Concurrent Connections
Security vs Performance Trade-off
RedPill achieves <2% performance overhead while providing:- ✅ Hardware-enforced privacy - TEE isolation
- ✅ Cryptographic attestation - Verifiable execution
- ✅ Memory encryption - AES-128 in hardware
- ✅ Zero trust architecture - No plaintext access
Important: Privacy guarantees require attestation verification. Always verify signatures in production.
Real-World Performance
Production Metrics (30-day average)
Metric | Value |
---|---|
Average Latency | 125ms |
P95 Latency | 210ms |
P99 Latency | 380ms |
Uptime | 99.95% |
Attestation Success Rate | 99.99% |
Comparison: Privacy-First Platforms
Platform | TEE Support | Gateway in TEE | Avg Latency | Attestation |
---|---|---|---|---|
RedPill | ✅ Full | ✅ Yes | 125ms | ✅ Every request |
Tinfoil | ✅ Models only | ❌ No | 140ms | ✅ Yes |
OpenRouter | ❌ None | ❌ No | 115ms | ❌ No |
Direct OpenAI | ❌ None | ❌ No | 110ms | ❌ No |
Unique Advantage: RedPill is the only platform where the entire gateway runs in TEE, protecting all 250+ models with hardware privacy.
Optimization Tips
Reduce Latency
- Use streaming - Start receiving tokens faster
- Choose nearby regions - Geographic latency matters
- Batch requests - Amortize attestation overhead
- Cache attestations - Verify once per session
Code Example: Optimized Request
Testing Methodology
All benchmarks measured using:- Geographic location: US-West (Oregon)
- Network: 1Gbps dedicated
- Test duration: 7 days continuous
- Request distribution: Exponential backoff
- Payload size: 500-2000 tokens average
- Verification: Full attestation chain checked
Verify Yourself
Run your own performance tests →