OpenAI Whisper
Whisper is OpenAI's speech recognition model, known for high accuracy across multiple languages.
Setup
1. Get API Key
Whisper uses the same API key as OpenAI's other services:
- Go to OpenAI Platform
- Navigate to API keys
- Copy your API key
2. Configure Environment
STT_PROVIDER=whisper
OPENAI_API_KEY=sk-...your-api-key
Configuration Options
# Required
STT_PROVIDER=whisper
OPENAI_API_KEY=sk-...
# Optional
WHISPER_MODEL=whisper-1 # Default model
WHISPER_DEFAULT_LANGUAGE=en # Language hint
Pricing
| Metric | Cost |
|---|---|
| Per minute | $0.006 |
| Per hour | $0.36 |
| 3-min session (~1 min speech) | ~$0.006 |
Limitations
- No real-time streaming - Audio must be fully recorded before transcription
- Max file duration - 2 hours per file
- Batch processing - Slight delay while processing
Best Practices
Optimize Audio Quality
- Use a good microphone
- Minimize background noise
- Ensure clear speech
Handle Silence
- Consider voice activity detection
- Don't send silent audio segments
Troubleshooting
"Invalid API key"
- Check your OpenAI API key
- Ensure billing is set up
Slow transcription
- Whisper processes in batch, not real-time
- Consider Deepgram for lower latency