Overview
RedPill’s TEE-protected gateway delivers near-native performance with hardware-enforced privacy. Our benchmarks demonstrate minimal overhead while providing cryptographic guarantees.GPU TEE Performance
NVIDIA H100 GPU TEE Efficiency
Phala’s confidential AI models running in NVIDIA H100 GPU TEE achieve 99% efficiency compared to non-TEE execution:| Metric | Non-TEE | GPU TEE | Efficiency |
|---|---|---|---|
| Throughput (tokens/sec) | 1000 | 990 | 99% |
| Latency P50 (ms) | 100 | 101 | 99% |
| Latency P95 (ms) | 150 | 152 | 98.7% |
| Memory Overhead | - | <2% | - |
Research Source: Confidential AI Benchmark Paper (arXiv)NVIDIA H100 GPU TEE provides hardware isolation with minimal performance impact.
RedPill Gateway Performance
Multi-Provider Routing Latency
Added latency from RedPill’s TEE-protected gateway:| Provider | Direct Latency | Via RedPill | Overhead |
|---|---|---|---|
| OpenAI GPT-4o | 250ms | 255ms | +5ms |
| Claude 3.5 Sonnet | 300ms | 306ms | +6ms |
| DeepSeek Chat | 180ms | 185ms | +5ms |
| Phala Qwen 2.5 | 120ms | 121ms | +1ms |
Attestation Verification
| Operation | Time | Notes |
|---|---|---|
| Generate Attestation | <50ms | Per request |
| Verify Signature | <10ms | Client-side |
| Full Chain Verification | <100ms | Intel/NVIDIA roots |
Confidential Model Benchmarks
Phala Confidential Models Performance
All models running in GPU TEE with cryptographic attestation:| Model | Context Length | Tokens/sec | Latency P50 | TEE Overhead |
|---|---|---|---|---|
| phala/qwen-2.5-7b-instruct | 32K | 850 | 95ms | <1% |
| phala/glm-5 | 64K | 920 | 110ms | <1% |
| phala/gpt-oss-120b | 8K | 680 | 145ms | <2% |
| phala/gemma-2-27b-it | 8K | 720 | 125ms | <1.5% |
| phala/llama-3.3-70b | 128K | 580 | 175ms | <2% |
| phala/qwen-qwq-32b | 32K | 650 | 140ms | <1.5% |
Test These Models
Try confidential models via API →
Throughput Comparison
Requests Per Second (RPS)
Single instance capacity:Concurrent Connections
Security vs Performance Trade-off
RedPill achieves <2% performance overhead while providing:- ✅ Hardware-enforced privacy - TEE isolation
- ✅ Cryptographic attestation - Verifiable execution
- ✅ Memory encryption - AES-128 in hardware
- ✅ Zero trust architecture - No plaintext access
Real-World Performance
Production Metrics (30-day average)
| Metric | Value |
|---|---|
| Average Latency | 125ms |
| P95 Latency | 210ms |
| P99 Latency | 380ms |
| Uptime | 99.95% |
| Attestation Success Rate | 99.99% |
Comparison: Privacy-First Platforms
| Platform | TEE Support | Gateway in TEE | Avg Latency | Attestation |
|---|---|---|---|---|
| RedPill | ✅ Full | ✅ Yes | 125ms | ✅ Every request |
| Tinfoil | ✅ Models only | ❌ No | 140ms | ✅ Yes |
| OpenRouter | ❌ None | ❌ No | 115ms | ❌ No |
| Direct OpenAI | ❌ None | ❌ No | 110ms | ❌ No |
Unique Advantage: RedPill is the only platform where the entire gateway runs in TEE, protecting all 260+ models (66+ provider integrations available) with hardware privacy.
Optimization Tips
Reduce Latency
- Use streaming - Start receiving tokens faster
- Choose nearby regions - Geographic latency matters
- Batch requests - Amortize attestation overhead
- Cache attestations - Verify once per session
Code Example: Optimized Request
Testing Methodology
All benchmarks measured using:- Geographic location: US-West (Oregon)
- Network: 1Gbps dedicated
- Test duration: 7 days continuous
- Request distribution: Exponential backoff
- Payload size: 500-2000 tokens average
- Verification: Full attestation chain checked
Verify Yourself
Run your own performance tests →
Next Steps
Try Confidential Models
Explore 6 TEE-protected models
Attestation Guide
Verify TEE execution
Streaming Guide
Optimize for speed
API Reference
Full API documentation