Skip to main content

OpenAI Whisper

Whisper is OpenAI's speech recognition model, known for high accuracy across multiple languages.

Setup

1. Get API Key

Whisper uses the same API key as OpenAI's other services:

  1. Go to OpenAI Platform
  2. Navigate to API keys
  3. Copy your API key

2. Configure Environment

STT_PROVIDER=whisper
OPENAI_API_KEY=sk-...your-api-key

Configuration Options

# Required
STT_PROVIDER=whisper
OPENAI_API_KEY=sk-...

# Optional
WHISPER_MODEL=whisper-1 # Default model
WHISPER_DEFAULT_LANGUAGE=en # Language hint

Pricing

MetricCost
Per minute$0.006
Per hour$0.36
3-min session (~1 min speech)~$0.006

Limitations

  • No real-time streaming - Audio must be fully recorded before transcription
  • Max file duration - 2 hours per file
  • Batch processing - Slight delay while processing

Best Practices

Optimize Audio Quality

  • Use a good microphone
  • Minimize background noise
  • Ensure clear speech

Handle Silence

  • Consider voice activity detection
  • Don't send silent audio segments

Troubleshooting

"Invalid API key"

  • Check your OpenAI API key
  • Ensure billing is set up

Slow transcription

  • Whisper processes in batch, not real-time
  • Consider Deepgram for lower latency