Skip to main content

Avatar Providers

Avatar providers render realistic video avatars that speak your AI's responses.

Supported Providers

Provider	Type	Quality	Pricing
HeyGen LiveAvatar	Real-time streaming	High	~$0.10/min

How It Works

Text sent to avatar provider
Audio synthesized from text
Video rendered with lip-sync
Streamed to client in real-time

[AI Response Text] → [TTS] → [Lip Sync] → [Video Stream] → [Client]

Avatar Selection

HeyGen provides various avatar options:

Public avatars - Pre-made professional avatars
Custom avatars - Create from your own footage
Photo avatars - Generate from a still image

Voice Selection

Each avatar supports multiple voices:

Default voice - Matched to avatar appearance
Custom voice - Clone a specific voice
Third-party - Use ElevenLabs, etc.

Session Lifecycle

Session created - Server-side, secure
Client connects - WebSocket established
Text sent - As AI generates responses
Video streams - Real-time to client
Session ends - On timeout or explicit close

Cost Considerations

Avatar streaming is typically the largest cost component:

Session Duration	Cost (Scale tier)
30 seconds (min)	$0.05
1 minute	$0.10
3 minutes	$0.30
5 minutes	$0.50
10 minutes	$1.00

Best Practices

Optimize Session Length

End sessions when inactive
Set reasonable timeouts
Monitor usage in admin panel

Handle Interruptions

Support user interruptions
Cancel pending speech
Smooth transition to new response

Configure HeyGen →

Supported Providers
How It Works
Avatar Selection
Voice Selection
Session Lifecycle
Cost Considerations
Best Practices
- Optimize Session Length
- Handle Interruptions