Overview
RedPill’s TEE-protected gateway delivers near-native performance with hardware-enforced privacy. Our benchmarks demonstrate minimal overhead while providing cryptographic guarantees.
NVIDIA H100 GPU TEE Efficiency
Phala’s confidential AI models running in NVIDIA H100 GPU TEE achieve 99% efficiency compared to non-TEE execution:
Metric Non-TEE GPU TEE Efficiency Throughput (tokens/sec) 1000 990 99% Latency P50 (ms) 100 101 99% Latency P95 (ms) 150 152 98.7% Memory Overhead - <2% -
Multi-Provider Routing Latency
Added latency from RedPill’s TEE-protected gateway:
Provider Direct Latency Via RedPill Overhead OpenAI GPT-4o 250ms 255ms +5ms Claude 3.5 Sonnet 300ms 306ms +6ms DeepSeek Chat 180ms 185ms +5ms Phala Qwen 2.5 120ms 121ms +1ms
Why so fast?
Hardware acceleration via Intel SGX/TDX
Optimized request routing in TEE
Minimal cryptographic overhead
Direct encrypted passthrough
Attestation Verification
Operation Time Notes Generate Attestation <50ms Per request Verify Signature <10ms Client-side Full Chain Verification <100ms Intel/NVIDIA roots
Confidential Model Benchmarks
All models running in GPU TEE with cryptographic attestation:
Model Context Length Tokens/sec Latency P50 TEE Overhead phala/qwen-2.5-7b-instruct 32K 850 95ms <1% phala/deepseek-chat-v3-0324 64K 920 110ms <1% phala/gpt-oss-120b 8K 680 145ms <2% phala/gemma-2-27b-it 8K 720 125ms <1.5% phala/llama-3.3-70b 128K 580 175ms <2% phala/qwen-qwq-32b 32K 650 140ms <1.5%
Test These Models Try confidential models via API →
Throughput Comparison
Requests Per Second (RPS)
Single instance capacity:
Standard API Gateway: 10,000 RPS
RedPill TEE Gateway: 9,800 RPS
Efficiency: 98%
Concurrent Connections
Max Concurrent Requests: 5,000
Average Response Time: 105ms
P95 Response Time: 180ms
P99 Response Time: 250ms
RedPill achieves <2% performance overhead while providing:
✅ Hardware-enforced privacy - TEE isolation
✅ Cryptographic attestation - Verifiable execution
✅ Memory encryption - AES-128 in hardware
✅ Zero trust architecture - No plaintext access
Important : Privacy guarantees require attestation verification. Always verify signatures in production.
Production Metrics (30-day average)
Metric Value Average Latency 125ms P95 Latency 210ms P99 Latency 380ms Uptime 99.95% Attestation Success Rate 99.99%
Platform TEE Support Gateway in TEE Avg Latency Attestation RedPill ✅ Full ✅ Yes 125ms ✅ Every request Tinfoil ✅ Models only ❌ No 140ms ✅ Yes OpenRouter ❌ None ❌ No 115ms ❌ No Direct OpenAI ❌ None ❌ No 110ms ❌ No
Unique Advantage : RedPill is the only platform where the entire gateway runs in TEE , protecting all 260+ models (66+ provider integrations available) with hardware privacy.
Optimization Tips
Reduce Latency
Use streaming - Start receiving tokens faster
Choose nearby regions - Geographic latency matters
Batch requests - Amortize attestation overhead
Cache attestations - Verify once per session
Code Example: Optimized Request
from openai import OpenAI
client = OpenAI(
base_url = "https://api.redpill.ai/v1" ,
api_key = "your-api-key"
)
# Streaming reduces time-to-first-token
stream = client.chat.completions.create(
model = "phala/qwen-2.5-7b-instruct" ,
messages = [{ "role" : "user" , "content" : "Explain quantum computing" }],
stream = True , # ⚡ Faster perceived latency
max_tokens = 500
)
for chunk in stream:
if chunk.choices[ 0 ].delta.content:
print (chunk.choices[ 0 ].delta.content, end = "" )
Testing Methodology
All benchmarks measured using:
Geographic location : US-West (Oregon)
Network : 1Gbps dedicated
Test duration : 7 days continuous
Request distribution : Exponential backoff
Payload size : 500-2000 tokens average
Verification : Full attestation chain checked
Verify Yourself Run your own performance tests →
Next Steps