Skip to main content

Overview

The Audio API provides text-to-speech synthesis, audio transcription, and audio translation capabilities. All audio processing flows through RedPill’s privacy-protected gateway.
All audio requests are processed through TEE-protected infrastructure, ensuring your audio data and transcripts remain confidential.

Supported Models

ModelProviderCapabilitiesLanguages
openai/tts-1OpenAIText-to-speechEnglish, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian, Turkish, Arabic, Chinese, Japanese, Korean, Hindi
openai/tts-1-hdOpenAIHigh-quality TTSSame as tts-1
openai/whisper-1OpenAITranscription99+ languages
groq/whisper-large-v3GroqFast transcription99+ languages
groq/whisper-large-v3-turboGroqUltra-fast transcription99+ languages

Text-to-Speech

Convert text to natural-sounding speech.
curl https://api.redpill.ai/v1/audio/speech \
  -H "Authorization: Bearer $REDPILL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/tts-1",
    "input": "Welcome to RedPill, the privacy-first AI platform. Your data is protected by hardware-enforced security.",
    "voice": "alloy",
    "response_format": "mp3",
    "speed": 1.0
  }' \
  --output speech.mp3

Request Parameters

model
string
required
The TTS model to use:
  • openai/tts-1 - Standard quality, faster
  • openai/tts-1-hd - High definition, higher quality
input
string
required
The text to convert to speech. Maximum length: 4096 characters.
voice
string
required
Voice to use for synthesis:
  • alloy - Neutral, balanced
  • echo - Warm, expressive
  • fable - Storytelling, dramatic
  • onyx - Deep, authoritative
  • nova - Energetic, youthful
  • shimmer - Soft, gentle
response_format
string
default:"mp3"
Audio format:
  • mp3 - MP3 audio (default)
  • opus - Opus audio (low latency)
  • aac - AAC audio
  • flac - FLAC audio (lossless)
  • wav - WAV audio (uncompressed)
  • pcm - PCM 16-bit audio
speed
number
default:"1.0"
Playback speed multiplier. Range: 0.25 to 4.0
  • 0.5 - Half speed
  • 1.0 - Normal speed
  • 1.5 - 1.5x speed
  • 2.0 - Double speed

Response

Returns binary audio data in the specified format.

Transcribe Audio

Convert audio to text.
curl https://api.redpill.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $REDPILL_API_KEY" \
  -F file="@audio.mp3" \
  -F model="openai/whisper-1" \
  -F language="en" \
  -F response_format="json" \
  -F temperature=0
{
  "text": "Welcome to RedPill, the privacy-first AI platform. Your data is protected by hardware-enforced security."
}

Request Parameters

file
file
required
Audio file to transcribe. Supported formats:
  • mp3, mp4, mpeg, mpga
  • m4a, wav, webm
Maximum file size: 25 MB
model
string
required
Transcription model:
  • openai/whisper-1 - Standard Whisper
  • groq/whisper-large-v3 - Fast transcription (Groq)
  • groq/whisper-large-v3-turbo - Ultra-fast (Groq)
language
string
ISO-639-1 language code (e.g., en, es, fr, de, zh, ja). Improves accuracy and latency when specified.
prompt
string
Optional text to guide the model’s style. Should match the audio’s language and context.
response_format
string
default:"json"
Output format:
  • json - JSON with text only
  • text - Plain text only
  • srt - SubRip subtitle format
  • vtt - WebVTT subtitle format
  • verbose_json - JSON with timestamps and metadata
temperature
number
default:"0"
Sampling temperature (0 to 1). Higher values increase randomness. Use 0 for deterministic outputs.
timestamp_granularities
array
Timestamp precision for verbose_json:
  • ["segment"] - Segment-level timestamps
  • ["word"] - Word-level timestamps
  • ["segment", "word"] - Both

Translate Audio

Translate audio from any supported language to English.
curl https://api.redpill.ai/v1/audio/translations \
  -H "Authorization: Bearer $REDPILL_API_KEY" \
  -F file="@german_audio.mp3" \
  -F model="openai/whisper-1" \
  -F response_format="json"

Request Parameters

file
file
required
Audio file in any supported language. Same file format requirements as transcription.
model
string
required
Translation model (e.g., openai/whisper-1)
prompt
string
Optional text to guide translation style
response_format
string
default:"json"
Output format: json, text, srt, vtt, or verbose_json
temperature
number
default:"0"
Sampling temperature (0 to 1)
Translation always outputs English text, regardless of input language.

Supported Languages

Whisper models support 99+ languages including:
English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian, Turkish, Ukrainian, Swedish, Danish, Norwegian, Finnish, Greek, Czech, Romanian, Hungarian, Croatian, Serbian, Bulgarian
Chinese (Mandarin), Japanese, Korean, Hindi, Bengali, Tamil, Telugu, Marathi, Urdu, Vietnamese, Thai, Indonesian, Malay, Filipino/Tagalog
Arabic, Hebrew, Persian (Farsi), Turkish
Afrikaans, Swahili, Icelandic, Estonian, Latvian, Lithuanian, Slovenian, Slovak, Welsh, and 60+ more

Privacy & Security

TEE-Protected Processing

All audio processing flows through hardware-protected secure enclaves

Confidential Transcripts

Audio files and transcripts processed in isolated secure environments

No Audio Storage

Audio files deleted immediately after processing

No Training Data

Your audio never used to train models

Best Practices

Transcription Tips

1

Use High-Quality Audio

  • Clear audio with minimal background noise
  • Sample rate: 16kHz or higher
  • Mono or stereo (mono preferred for speech)
2

Specify Language

Always provide the language parameter when known - improves accuracy and reduces latency
3

Provide Context with Prompts

transcript = client.audio.transcriptions.create(
    model="openai/whisper-1",
    file=audio_file,
    language="en",
    prompt="RedPill AI, TEE, attestation, confidential computing"
)
Prompts help with technical terms, proper nouns, and domain-specific vocabulary
4

Choose Appropriate Model

  • Standard accuracy: openai/whisper-1
  • Fast processing: groq/whisper-large-v3-turbo
  • Balanced: groq/whisper-large-v3

Text-to-Speech Tips

1

Choose the Right Voice

Test different voices to find the best match for your use case:
  • Professional/Corporate: alloy, onyx
  • Friendly/Casual: nova, shimmer
  • Storytelling: fable, echo
2

Use Appropriate Format

  • Real-time streaming: opus (lowest latency)
  • File storage: mp3 (good compression)
  • High quality: flac (lossless)
  • Processing: pcm (uncompressed)
3

Optimize Text

  • Use punctuation for natural pauses
  • Spell out abbreviations (e.g., “Doctor” instead of “Dr.”)
  • Use SSML for advanced control (model-specific)
4

Manage Speed

# Faster narration (1.25x)
response = client.audio.speech.create(
    model="openai/tts-1",
    voice="alloy",
    input=text,
    speed=1.25
)

Use Cases

Voice Assistants

# Convert user query to text, process, respond with speech
with open("user_query.mp3", "rb") as audio:
    transcript = client.audio.transcriptions.create(
        model="groq/whisper-large-v3-turbo",  # Fast
        file=audio
    )

# Process with LLM
response = client.chat.completions.create(
    model="openai/gpt-5-mini",
    messages=[{"role": "user", "content": transcript.text}]
)

# Convert response to speech
audio = client.audio.speech.create(
    model="openai/tts-1",
    voice="nova",
    input=response.choices[0].message.content
)
audio.stream_to_file("response.mp3")

Meeting Transcription

with open("meeting.mp3", "rb") as audio:
    transcript = client.audio.transcriptions.create(
        model="openai/whisper-1",
        file=audio,
        response_format="verbose_json",
        timestamp_granularities=["segment"]
    )

# Process segments for speaker diarization
for segment in transcript.segments:
    print(f"[{segment['start']:.2f}s - {segment['end']:.2f}s]: {segment['text']}")

Content Localization

# Translate foreign language audio to English
with open("spanish_content.mp3", "rb") as audio:
    translation = client.audio.translations.create(
        model="openai/whisper-1",
        file=audio,
        response_format="srt"  # Get subtitles
    )

with open("english_subtitles.srt", "w") as f:
    f.write(translation)

Accessibility

# Generate audio description for visually impaired users
description = "A serene mountain landscape at sunset..."

audio = client.audio.speech.create(
    model="openai/tts-1-hd",  # High quality
    voice="shimmer",  # Gentle voice
    input=description,
    speed=0.9  # Slightly slower for clarity
)
audio.stream_to_file("description.mp3")

Error Handling

{
  "error": {
    "message": "Invalid audio file format. Supported: mp3, mp4, wav, webm",
    "type": "invalid_request_error",
    "param": "file",
    "code": 400
  }
}

Next Steps