Overview

RedPill’s TEE-protected gateway delivers near-native performance with hardware-enforced privacy. Our benchmarks demonstrate minimal overhead while providing cryptographic guarantees.

GPU TEE Performance

NVIDIA H100 GPU TEE Efficiency

Phala’s confidential AI models running in NVIDIA H100 GPU TEE achieve 99% efficiency compared to non-TEE execution:
MetricNon-TEEGPU TEEEfficiency
Throughput (tokens/sec)100099099%
Latency P50 (ms)10010199%
Latency P95 (ms)15015298.7%
Memory Overhead-<2%-
Research Source: Confidential AI Benchmark Paper (arXiv)NVIDIA H100 GPU TEE provides hardware isolation with minimal performance impact.

RedPill Gateway Performance

Multi-Provider Routing Latency

Added latency from RedPill’s TEE-protected gateway:
ProviderDirect LatencyVia RedPillOverhead
OpenAI GPT-4o250ms255ms+5ms
Claude 3.5 Sonnet300ms306ms+6ms
DeepSeek Chat180ms185ms+5ms
Phala Qwen 2.5120ms121ms+1ms
Why so fast?
  • Hardware acceleration via Intel SGX/TDX
  • Optimized request routing in TEE
  • Minimal cryptographic overhead
  • Direct encrypted passthrough

Attestation Verification

OperationTimeNotes
Generate Attestation<50msPer request
Verify Signature<10msClient-side
Full Chain Verification<100msIntel/NVIDIA roots

Confidential Model Benchmarks

Phala Confidential Models Performance

All models running in GPU TEE with cryptographic attestation:
ModelContext LengthTokens/secLatency P50TEE Overhead
phala/qwen-2.5-7b-instruct32K85095ms<1%
phala/deepseek-chat-v3-032464K920110ms<1%
phala/gpt-oss-120b8K680145ms<2%
phala/gemma-2-27b-it8K720125ms<1.5%
phala/llama-3.3-70b128K580175ms<2%
phala/qwen-qwq-32b32K650140ms<1.5%

Test These Models

Try confidential models via API →

Throughput Comparison

Requests Per Second (RPS)

Single instance capacity:
Standard API Gateway:     10,000 RPS
RedPill TEE Gateway:       9,800 RPS
Efficiency:                  98%

Concurrent Connections

Max Concurrent Requests:  5,000
Average Response Time:    105ms
P95 Response Time:        180ms
P99 Response Time:        250ms

Security vs Performance Trade-off

RedPill achieves <2% performance overhead while providing:
  • Hardware-enforced privacy - TEE isolation
  • Cryptographic attestation - Verifiable execution
  • Memory encryption - AES-128 in hardware
  • Zero trust architecture - No plaintext access
Important: Privacy guarantees require attestation verification. Always verify signatures in production.

Real-World Performance

Production Metrics (30-day average)

MetricValue
Average Latency125ms
P95 Latency210ms
P99 Latency380ms
Uptime99.95%
Attestation Success Rate99.99%

Comparison: Privacy-First Platforms

PlatformTEE SupportGateway in TEEAvg LatencyAttestation
RedPill✅ Full✅ Yes125ms✅ Every request
Tinfoil✅ Models only❌ No140ms✅ Yes
OpenRouter❌ None❌ No115ms❌ No
Direct OpenAI❌ None❌ No110ms❌ No
Unique Advantage: RedPill is the only platform where the entire gateway runs in TEE, protecting all 250+ models with hardware privacy.

Optimization Tips

Reduce Latency

  1. Use streaming - Start receiving tokens faster
  2. Choose nearby regions - Geographic latency matters
  3. Batch requests - Amortize attestation overhead
  4. Cache attestations - Verify once per session

Code Example: Optimized Request

from openai import OpenAI

client = OpenAI(
    base_url="https://api.redpill.ai/v1",
    api_key="your-api-key"
)

# Streaming reduces time-to-first-token
stream = client.chat.completions.create(
    model="phala/qwen-2.5-7b-instruct",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True,  # ⚡ Faster perceived latency
    max_tokens=500
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Testing Methodology

All benchmarks measured using:
  • Geographic location: US-West (Oregon)
  • Network: 1Gbps dedicated
  • Test duration: 7 days continuous
  • Request distribution: Exponential backoff
  • Payload size: 500-2000 tokens average
  • Verification: Full attestation chain checked

Verify Yourself

Run your own performance tests →

Next Steps