Skip to main content

Speech-to-Text Providers

STT (Speech-to-Text) providers convert user voice input to text for processing.

Supported Providers

Provider	Real-time	Accuracy	Pricing
OpenAI Whisper	No	Excellent	$0.006/min
Deepgram	Yes	Very Good	$0.0077/min

Feature Comparison

Feature	Whisper	Deepgram
Real-time streaming	❌	✅
Speaker diarization	❌	✅
Custom vocabulary	❌	✅
Punctuation	✅	✅
Multi-language	✅	✅
Word timestamps	✅	✅

When to Use Each

Choose Whisper if:

You prioritize transcription accuracy
Latency isn't critical (batch mode is fine)
You already have OpenAI API access
Cost is the primary concern

Choose Deepgram if:

You need real-time transcription
You want streaming for live captions
You need speaker identification
You have specialized vocabulary

Cost Comparison

For a 3-minute conversation with ~1 minute of user speech:

Provider	Cost
Whisper	$0.006
Deepgram	$0.0077

Both are extremely affordable. The choice typically comes down to features rather than cost.

Supported Providers
Feature Comparison
When to Use Each
- Choose Whisper if:
- Choose Deepgram if:
Cost Comparison