Overview
The Audio API provides text-to-speech synthesis, audio transcription, and audio translation capabilities. All audio processing flows through RedPill’s privacy-protected gateway.All audio requests are processed through TEE-protected infrastructure, ensuring your audio data and transcripts remain confidential.
Supported Models
| Model | Provider | Capabilities | Languages |
|---|---|---|---|
openai/tts-1 | OpenAI | Text-to-speech | English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian, Turkish, Arabic, Chinese, Japanese, Korean, Hindi |
openai/tts-1-hd | OpenAI | High-quality TTS | Same as tts-1 |
openai/whisper-1 | OpenAI | Transcription | 99+ languages |
groq/whisper-large-v3 | Groq | Fast transcription | 99+ languages |
groq/whisper-large-v3-turbo | Groq | Ultra-fast transcription | 99+ languages |
Text-to-Speech
Convert text to natural-sounding speech.Request Parameters
The TTS model to use:
openai/tts-1- Standard quality, fasteropenai/tts-1-hd- High definition, higher quality
The text to convert to speech. Maximum length: 4096 characters.
Voice to use for synthesis:
alloy- Neutral, balancedecho- Warm, expressivefable- Storytelling, dramaticonyx- Deep, authoritativenova- Energetic, youthfulshimmer- Soft, gentle
Audio format:
mp3- MP3 audio (default)opus- Opus audio (low latency)aac- AAC audioflac- FLAC audio (lossless)wav- WAV audio (uncompressed)pcm- PCM 16-bit audio
Playback speed multiplier. Range: 0.25 to 4.0
0.5- Half speed1.0- Normal speed1.5- 1.5x speed2.0- Double speed
Response
Returns binary audio data in the specified format.Transcribe Audio
Convert audio to text.Request Parameters
Audio file to transcribe. Supported formats:
mp3,mp4,mpeg,mpgam4a,wav,webm
Transcription model:
openai/whisper-1- Standard Whispergroq/whisper-large-v3- Fast transcription (Groq)groq/whisper-large-v3-turbo- Ultra-fast (Groq)
ISO-639-1 language code (e.g.,
en, es, fr, de, zh, ja). Improves accuracy and latency when specified.Optional text to guide the model’s style. Should match the audio’s language and context.
Output format:
json- JSON with text onlytext- Plain text onlysrt- SubRip subtitle formatvtt- WebVTT subtitle formatverbose_json- JSON with timestamps and metadata
Sampling temperature (0 to 1). Higher values increase randomness. Use 0 for deterministic outputs.
Timestamp precision for
verbose_json:["segment"]- Segment-level timestamps["word"]- Word-level timestamps["segment", "word"]- Both
Translate Audio
Translate audio from any supported language to English.Request Parameters
Audio file in any supported language. Same file format requirements as transcription.
Translation model (e.g.,
openai/whisper-1)Optional text to guide translation style
Output format:
json, text, srt, vtt, or verbose_jsonSampling temperature (0 to 1)
Translation always outputs English text, regardless of input language.
Supported Languages
Whisper models support 99+ languages including:European Languages
European Languages
English, Spanish, French, German, Italian, Portuguese, Dutch, Polish, Russian, Turkish, Ukrainian, Swedish, Danish, Norwegian, Finnish, Greek, Czech, Romanian, Hungarian, Croatian, Serbian, Bulgarian
Asian Languages
Asian Languages
Chinese (Mandarin), Japanese, Korean, Hindi, Bengali, Tamil, Telugu, Marathi, Urdu, Vietnamese, Thai, Indonesian, Malay, Filipino/Tagalog
Middle Eastern Languages
Middle Eastern Languages
Arabic, Hebrew, Persian (Farsi), Turkish
Other Languages
Other Languages
Afrikaans, Swahili, Icelandic, Estonian, Latvian, Lithuanian, Slovenian, Slovak, Welsh, and 60+ more
Privacy & Security
TEE-Protected Processing
All audio processing flows through hardware-protected secure enclaves
Confidential Transcripts
Audio files and transcripts processed in isolated secure environments
No Audio Storage
Audio files deleted immediately after processing
No Training Data
Your audio never used to train models
Best Practices
Transcription Tips
1
Use High-Quality Audio
- Clear audio with minimal background noise
- Sample rate: 16kHz or higher
- Mono or stereo (mono preferred for speech)
2
Specify Language
Always provide the
language parameter when known - improves accuracy and reduces latency3
Provide Context with Prompts
4
Choose Appropriate Model
- Standard accuracy:
openai/whisper-1 - Fast processing:
groq/whisper-large-v3-turbo - Balanced:
groq/whisper-large-v3
Text-to-Speech Tips
1
Choose the Right Voice
Test different voices to find the best match for your use case:
- Professional/Corporate:
alloy,onyx - Friendly/Casual:
nova,shimmer - Storytelling:
fable,echo
2
Use Appropriate Format
- Real-time streaming:
opus(lowest latency) - File storage:
mp3(good compression) - High quality:
flac(lossless) - Processing:
pcm(uncompressed)
3
Optimize Text
- Use punctuation for natural pauses
- Spell out abbreviations (e.g., “Doctor” instead of “Dr.”)
- Use SSML for advanced control (model-specific)
4
Manage Speed