Skip to main content

Speech-to-Text Providers

STT (Speech-to-Text) providers convert user voice input to text for processing.

Supported Providers

ProviderReal-timeAccuracyPricing
OpenAI WhisperNoExcellent$0.006/min
DeepgramYesVery Good$0.0077/min

Feature Comparison

FeatureWhisperDeepgram
Real-time streaming
Speaker diarization
Custom vocabulary
Punctuation
Multi-language
Word timestamps

When to Use Each

Choose Whisper if:

  • You prioritize transcription accuracy
  • Latency isn't critical (batch mode is fine)
  • You already have OpenAI API access
  • Cost is the primary concern

Choose Deepgram if:

  • You need real-time transcription
  • You want streaming for live captions
  • You need speaker identification
  • You have specialized vocabulary

Cost Comparison

For a 3-minute conversation with ~1 minute of user speech:

ProviderCost
Whisper$0.006
Deepgram$0.0077

Both are extremely affordable. The choice typically comes down to features rather than cost.