Audio

Overview

The Audio API provides text-to-speech synthesis, audio transcription, and audio translation capabilities. All audio processing flows through RedPill’s privacy-protected gateway.

All audio requests are processed through TEE-protected infrastructure, ensuring your audio data and transcripts remain confidential.

Supported Models

Model	Provider	Capabilities	Languages
`openai/tts-1`	OpenAI	Text-to-speech	English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian, Turkish, Arabic, Chinese, Japanese, Korean, Hindi
`openai/tts-1-hd`	OpenAI	High-quality TTS	Same as tts-1
`openai/whisper-1`	OpenAI	Transcription	99+ languages
`groq/whisper-large-v3`	Groq	Fast transcription	99+ languages
`groq/whisper-large-v3-turbo`	Groq	Ultra-fast transcription	99+ languages

Text-to-Speech

Convert text to natural-sounding speech.

curl https://api.redpill.ai/v1/audio/speech \
  -H "Authorization: Bearer $REDPILL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/tts-1",
    "input": "Welcome to RedPill, the privacy-first AI platform. Your data is protected by hardware-enforced security.",
    "voice": "alloy",
    "response_format": "mp3",
    "speed": 1.0
  }' \
  --output speech.mp3

Request Parameters

model

string

required

The TTS model to use:

openai/tts-1 - Standard quality, faster
openai/tts-1-hd - High definition, higher quality

input

string

required

The text to convert to speech. Maximum length: 4096 characters.

voice

string

required

Voice to use for synthesis:

alloy - Neutral, balanced
echo - Warm, expressive
fable - Storytelling, dramatic
onyx - Deep, authoritative
nova - Energetic, youthful
shimmer - Soft, gentle

response_format

string

default:"mp3"

Audio format:

mp3 - MP3 audio (default)
opus - Opus audio (low latency)
aac - AAC audio
flac - FLAC audio (lossless)
wav - WAV audio (uncompressed)
pcm - PCM 16-bit audio

speed

number

default:"1.0"

Playback speed multiplier. Range: 0.25 to 4.0

0.5 - Half speed
1.0 - Normal speed
1.5 - 1.5x speed
2.0 - Double speed

Response

Returns binary audio data in the specified format.

Transcribe Audio

Convert audio to text.

curl https://api.redpill.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $REDPILL_API_KEY" \
  -F file="@audio.mp3" \
  -F model="openai/whisper-1" \
  -F language="en" \
  -F response_format="json" \
  -F temperature=0

{
  "text": "Welcome to RedPill, the privacy-first AI platform. Your data is protected by hardware-enforced security."
}

Request Parameters

file

required

Audio file to transcribe. Supported formats:

mp3, mp4, mpeg, mpga
m4a, wav, webm

Maximum file size: 25 MB

model

string

required

Transcription model:

openai/whisper-1 - Standard Whisper
groq/whisper-large-v3 - Fast transcription (Groq)
groq/whisper-large-v3-turbo - Ultra-fast (Groq)

language

string

ISO-639-1 language code (e.g., en, es, fr, de, zh, ja). Improves accuracy and latency when specified.

prompt

string

Optional text to guide the model’s style. Should match the audio’s language and context.

response_format

string

default:"json"

Output format:

json - JSON with text only
text - Plain text only
srt - SubRip subtitle format
vtt - WebVTT subtitle format
verbose_json - JSON with timestamps and metadata

temperature

number

default:"0"

Sampling temperature (0 to 1). Higher values increase randomness. Use 0 for deterministic outputs.

timestamp_granularities

array

Timestamp precision for verbose_json:

["segment"] - Segment-level timestamps
["word"] - Word-level timestamps
["segment", "word"] - Both

Translate Audio

Translate audio from any supported language to English.

curl https://api.redpill.ai/v1/audio/translations \
  -H "Authorization: Bearer $REDPILL_API_KEY" \
  -F file="@german_audio.mp3" \
  -F model="openai/whisper-1" \
  -F response_format="json"

Request Parameters

file

required

Audio file in any supported language. Same file format requirements as transcription.

model

string

required

Translation model (e.g., openai/whisper-1)

prompt

string

Optional text to guide translation style

response_format

string

default:"json"

Output format: json, text, srt, vtt, or verbose_json

temperature

number

default:"0"

Sampling temperature (0 to 1)

Translation always outputs English text, regardless of input language.

Supported Languages

Whisper models support 99+ languages including:

European Languages

English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian, Turkish, Ukrainian, Swedish, Danish, Norwegian, Finnish, Greek, Czech, Romanian, Hungarian, Croatian, Serbian, Bulgarian

Asian Languages

Chinese (Mandarin), Japanese, Korean, Hindi, Bengali, Tamil, Telugu, Marathi, Urdu, Vietnamese, Thai, Indonesian, Malay, Filipino/Tagalog

Middle Eastern Languages

Arabic, Hebrew, Persian (Farsi), Turkish

Other Languages

Afrikaans, Swahili, Icelandic, Estonian, Latvian, Lithuanian, Slovenian, Slovak, Welsh, and 60+ more

Privacy & Security

TEE-Protected Processing

All audio processing flows through hardware-protected secure enclaves

Confidential Transcripts

Audio files and transcripts processed in isolated secure environments

No Audio Storage

Audio files deleted immediately after processing

No Training Data

Your audio never used to train models

Best Practices

Transcription Tips

Use High-Quality Audio

Clear audio with minimal background noise
Sample rate: 16kHz or higher
Mono or stereo (mono preferred for speech)

Specify Language

Always provide the language parameter when known - improves accuracy and reduces latency

Provide Context with Prompts

transcript = client.audio.transcriptions.create(
    model="openai/whisper-1",
    file=audio_file,
    language="en",
    prompt="RedPill AI, TEE, attestation, confidential computing"
)

Prompts help with technical terms, proper nouns, and domain-specific vocabulary

Choose Appropriate Model

Standard accuracy: openai/whisper-1
Fast processing: groq/whisper-large-v3-turbo
Balanced: groq/whisper-large-v3

Text-to-Speech Tips

Choose the Right Voice

Test different voices to find the best match for your use case:

Professional/Corporate: alloy, onyx
Friendly/Casual: nova, shimmer
Storytelling: fable, echo

Use Appropriate Format

Real-time streaming: opus (lowest latency)
File storage: mp3 (good compression)
High quality: flac (lossless)
Processing: pcm (uncompressed)

Optimize Text

Use punctuation for natural pauses
Spell out abbreviations (e.g., “Doctor” instead of “Dr.”)
Use SSML for advanced control (model-specific)

Manage Speed

# Faster narration (1.25x)
response = client.audio.speech.create(
    model="openai/tts-1",
    voice="alloy",
    input=text,
    speed=1.25
)

Use Cases

Voice Assistants

# Convert user query to text, process, respond with speech
with open("user_query.mp3", "rb") as audio:
    transcript = client.audio.transcriptions.create(
        model="groq/whisper-large-v3-turbo",  # Fast
        file=audio
    )

# Process with LLM
response = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[{"role": "user", "content": transcript.text}]
)

# Convert response to speech
audio = client.audio.speech.create(
    model="openai/tts-1",
    voice="nova",
    input=response.choices[0].message.content
)
audio.stream_to_file("response.mp3")

Meeting Transcription

with open("meeting.mp3", "rb") as audio:
    transcript = client.audio.transcriptions.create(
        model="openai/whisper-1",
        file=audio,
        response_format="verbose_json",
        timestamp_granularities=["segment"]
    )

# Process segments for speaker diarization
for segment in transcript.segments:
    print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s]: {segment['text']}")

Content Localization

# Translate foreign language audio to English
with open("spanish_content.mp3", "rb") as audio:
    translation = client.audio.translations.create(
        model="openai/whisper-1",
        file=audio,
        response_format="srt"  # Get subtitles
    )

with open("english_subtitles.srt", "w") as f:
    f.write(translation)

Accessibility

# Generate audio description for visually impaired users
description = "A serene mountain landscape at sunset..."

audio = client.audio.speech.create(
    model="openai/tts-1-hd",  # High quality
    voice="shimmer",  # Gentle voice
    input=description,
    speed=0.9  # Slightly slower for clarity
)
audio.stream_to_file("description.mp3")

Error Handling

{
  "error": {
    "message": "Invalid audio file format. Supported: mp3, mp4, wav, webm",
    "type": "invalid_request_error",
    "param": "file",
    "code": 400
  }
}

Next Steps

Streaming

Learn about streaming responses

Vision Models

Process images and videos

Pricing

View audio processing pricing

Supported Models

Browse all available models

Get Started

API Reference

Guides

Integrations

Use Cases

Overview

Supported Models

Text-to-Speech

Request Parameters

Response

Transcribe Audio

Request Parameters

Translate Audio

Request Parameters

Supported Languages

Privacy & Security

TEE-Protected Processing

Confidential Transcripts

No Audio Storage

No Training Data

Best Practices

Transcription Tips

Text-to-Speech Tips

Use Cases

Voice Assistants

Meeting Transcription

Content Localization

Accessibility

Error Handling

Next Steps

Streaming

Vision Models

Pricing

Supported Models

Get Started

API Reference

Guides

Integrations

Use Cases

​Overview

​Supported Models

​Text-to-Speech

​Request Parameters

​Response

​Transcribe Audio

​Request Parameters

​Translate Audio

​Request Parameters

​Supported Languages

​Privacy & Security

TEE-Protected Processing

Confidential Transcripts

No Audio Storage

No Training Data

​Best Practices

​Transcription Tips

​Text-to-Speech Tips

​Use Cases

​Voice Assistants

​Meeting Transcription

​Content Localization

​Accessibility

​Error Handling

​Next Steps

Streaming

Vision Models

Pricing

Supported Models

Overview

Supported Models

Text-to-Speech

Request Parameters

Response

Transcribe Audio

Request Parameters

Translate Audio

Request Parameters

Supported Languages

Privacy & Security

Best Practices

Transcription Tips

Text-to-Speech Tips

Use Cases

Voice Assistants

Meeting Transcription

Content Localization

Accessibility

Error Handling

Next Steps