Skip to main content

Avatar Providers

Avatar providers render realistic video avatars that speak your AI's responses.

Supported Providers

ProviderTypeQualityPricing
HeyGen LiveAvatarReal-time streamingHigh~$0.10/min

How It Works

  1. Text sent to avatar provider
  2. Audio synthesized from text
  3. Video rendered with lip-sync
  4. Streamed to client in real-time
[AI Response Text] → [TTS] → [Lip Sync] → [Video Stream] → [Client]

Avatar Selection

HeyGen provides various avatar options:

  • Public avatars - Pre-made professional avatars
  • Custom avatars - Create from your own footage
  • Photo avatars - Generate from a still image

Voice Selection

Each avatar supports multiple voices:

  • Default voice - Matched to avatar appearance
  • Custom voice - Clone a specific voice
  • Third-party - Use ElevenLabs, etc.

Session Lifecycle

  1. Session created - Server-side, secure
  2. Client connects - WebSocket established
  3. Text sent - As AI generates responses
  4. Video streams - Real-time to client
  5. Session ends - On timeout or explicit close

Cost Considerations

Avatar streaming is typically the largest cost component:

Session DurationCost (Scale tier)
30 seconds (min)$0.05
1 minute$0.10
3 minutes$0.30
5 minutes$0.50
10 minutes$1.00

Best Practices

Optimize Session Length

  • End sessions when inactive
  • Set reasonable timeouts
  • Monitor usage in admin panel

Handle Interruptions

  • Support user interruptions
  • Cancel pending speech
  • Smooth transition to new response

Configure HeyGen →