OpenAI Whisper

Whisper is OpenAI's speech recognition model, known for high accuracy across multiple languages.

Setup

1. Get API Key

Whisper uses the same API key as OpenAI's other services:

Go to OpenAI Platform
Navigate to API keys
Copy your API key

2. Configure Environment

STT_PROVIDER=whisper
OPENAI_API_KEY=sk-...your-api-key

Configuration Options

# Required
STT_PROVIDER=whisper
OPENAI_API_KEY=sk-...

# Optional
WHISPER_MODEL=whisper-1           # Default model
WHISPER_DEFAULT_LANGUAGE=en       # Language hint

Pricing

Metric	Cost
Per minute	$0.006
Per hour	$0.36
3-min session (~1 min speech)	~$0.006

Limitations

No real-time streaming - Audio must be fully recorded before transcription
Max file duration - 2 hours per file
Batch processing - Slight delay while processing

Best Practices

Optimize Audio Quality

Use a good microphone
Minimize background noise
Ensure clear speech

Handle Silence

Consider voice activity detection
Don't send silent audio segments

Troubleshooting

"Invalid API key"

Check your OpenAI API key
Ensure billing is set up

Slow transcription

Whisper processes in batch, not real-time
Consider Deepgram for lower latency

Setup​

1. Get API Key​

2. Configure Environment​

Configuration Options​

Pricing​

Limitations​

Best Practices​

Optimize Audio Quality​

Handle Silence​

Troubleshooting​

"Invalid API key"​

Slow transcription​

Setup

1. Get API Key

2. Configure Environment

Configuration Options

Pricing

Limitations

Best Practices

Optimize Audio Quality

Handle Silence

Troubleshooting

"Invalid API key"

Slow transcription