Performance Benchmarks

Overview

RedPill’s TEE-protected gateway delivers near-native performance with hardware-enforced privacy. Our benchmarks demonstrate minimal overhead while providing cryptographic guarantees.

GPU TEE Performance

NVIDIA H100 GPU TEE Efficiency

Phala’s confidential AI models running in NVIDIA H100 GPU TEE achieve 99% efficiency compared to non-TEE execution:

Metric	Non-TEE	GPU TEE	Efficiency
Throughput (tokens/sec)	1000	990	99%
Latency P50 (ms)	100	101	99%
Latency P95 (ms)	150	152	98.7%
Memory Overhead	-	<2%	-

Research Source: Confidential AI Benchmark Paper (arXiv)NVIDIA H100 GPU TEE provides hardware isolation with minimal performance impact.

RedPill Gateway Performance

Multi-Provider Routing Latency

Added latency from RedPill’s TEE-protected gateway:

Provider	Direct Latency	Via RedPill	Overhead
OpenAI GPT-4o	250ms	255ms	+5ms
Claude 3.5 Sonnet	300ms	306ms	+6ms
DeepSeek Chat	180ms	185ms	+5ms
Phala Qwen 2.5	120ms	121ms	+1ms

Why so fast?

Hardware acceleration via Intel SGX/TDX
Optimized request routing in TEE
Minimal cryptographic overhead
Direct encrypted passthrough

Attestation Verification

Operation	Time	Notes
Generate Attestation	<50ms	Per request
Verify Signature	<10ms	Client-side
Full Chain Verification	<100ms	Intel/NVIDIA roots

Confidential Model Benchmarks

Phala Confidential Models Performance

All models running in GPU TEE with cryptographic attestation:

Model	Context Length	Tokens/sec	Latency P50	TEE Overhead
phala/qwen-2.5-7b-instruct	32K	850	95ms	<1%
phala/deepseek-chat-v3-0324	64K	920	110ms	<1%
phala/gpt-oss-120b	8K	680	145ms	<2%
phala/gemma-2-27b-it	8K	720	125ms	<1.5%
phala/llama-3.3-70b	128K	580	175ms	<2%
phala/qwen-qwq-32b	32K	650	140ms	<1.5%

Test These Models

Try confidential models via API →

Throughput Comparison

Requests Per Second (RPS)

Single instance capacity:

Standard API Gateway:     10,000 RPS
RedPill TEE Gateway:       9,800 RPS
Efficiency:                  98%

Concurrent Connections

Max Concurrent Requests:  5,000
Average Response Time:    105ms
P95 Response Time:        180ms
P99 Response Time:        250ms

Security vs Performance Trade-off

RedPill achieves <2% performance overhead while providing:

✅ Hardware-enforced privacy - TEE isolation
✅ Cryptographic attestation - Verifiable execution
✅ Memory encryption - AES-128 in hardware
✅ Zero trust architecture - No plaintext access

Important: Privacy guarantees require attestation verification. Always verify signatures in production.

Real-World Performance

Production Metrics (30-day average)

Metric	Value
Average Latency	125ms
P95 Latency	210ms
P99 Latency	380ms
Uptime	99.95%
Attestation Success Rate	99.99%

Comparison: Privacy-First Platforms

Platform	TEE Support	Gateway in TEE	Avg Latency	Attestation
RedPill	✅ Full	✅ Yes	125ms	✅ Every request
Tinfoil	✅ Models only	❌ No	140ms	✅ Yes
OpenRouter	❌ None	❌ No	115ms	❌ No
Direct OpenAI	❌ None	❌ No	110ms	❌ No

Unique Advantage: RedPill is the only platform where the entire gateway runs in TEE, protecting all 260+ models (66+ provider integrations available) with hardware privacy.

Optimization Tips

Reduce Latency

Use streaming - Start receiving tokens faster
Choose nearby regions - Geographic latency matters
Batch requests - Amortize attestation overhead
Cache attestations - Verify once per session

Code Example: Optimized Request

from openai import OpenAI

client = OpenAI(
    base_url="https://api.redpill.ai/v1",
    api_key="your-api-key"
)

# Streaming reduces time-to-first-token
stream = client.chat.completions.create(
    model="phala/qwen-2.5-7b-instruct",
    messages=[{"role": "user", "content": "Explain quantum computing"}],
    stream=True,  # ⚡ Faster perceived latency
    max_tokens=500
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Testing Methodology

All benchmarks measured using:

Geographic location: US-West (Oregon)
Network: 1Gbps dedicated
Test duration: 7 days continuous
Request distribution: Exponential backoff
Payload size: 500-2000 tokens average
Verification: Full attestation chain checked

Verify Yourself

Run your own performance tests →

Next Steps

Try Confidential Models

Explore 6 TEE-protected models

Attestation Guide

Verify TEE execution

Streaming Guide

Optimize for speed

API Reference

Full API documentation

Privacy & Security

Verify It Yourself

Confidential AI

Performance Benchmarks

Overview

GPU TEE Performance

NVIDIA H100 GPU TEE Efficiency

RedPill Gateway Performance

Multi-Provider Routing Latency

Attestation Verification

Confidential Model Benchmarks

Phala Confidential Models Performance

Test These Models

Throughput Comparison

Requests Per Second (RPS)

Concurrent Connections

Security vs Performance Trade-off

Real-World Performance

Production Metrics (30-day average)

Comparison: Privacy-First Platforms

Optimization Tips

Reduce Latency

Code Example: Optimized Request

Testing Methodology

Verify Yourself

Next Steps

Try Confidential Models

Attestation Guide

Streaming Guide

API Reference

Privacy & Security

Verify It Yourself

Confidential AI

​Overview

​GPU TEE Performance

​NVIDIA H100 GPU TEE Efficiency

​RedPill Gateway Performance

​Multi-Provider Routing Latency

​Attestation Verification

​Confidential Model Benchmarks

​Phala Confidential Models Performance

Test These Models

​Throughput Comparison

​Requests Per Second (RPS)

​Concurrent Connections

​Security vs Performance Trade-off

​Real-World Performance

​Production Metrics (30-day average)

​Comparison: Privacy-First Platforms

​Optimization Tips

​Reduce Latency

​Code Example: Optimized Request

​Testing Methodology

Verify Yourself

​Next Steps

Try Confidential Models

Attestation Guide

Streaming Guide

API Reference

Overview

GPU TEE Performance

NVIDIA H100 GPU TEE Efficiency

RedPill Gateway Performance

Multi-Provider Routing Latency

Attestation Verification

Confidential Model Benchmarks

Phala Confidential Models Performance

Throughput Comparison

Requests Per Second (RPS)

Concurrent Connections

Security vs Performance Trade-off

Real-World Performance

Production Metrics (30-day average)

Comparison: Privacy-First Platforms

Optimization Tips

Reduce Latency

Code Example: Optimized Request

Testing Methodology

Next Steps