Avatar Providers
Avatar providers render realistic video avatars that speak your AI's responses.
Supported Providers
| Provider | Type | Quality | Pricing |
|---|---|---|---|
| HeyGen LiveAvatar | Real-time streaming | High | ~$0.10/min |
How It Works
- Text sent to avatar provider
- Audio synthesized from text
- Video rendered with lip-sync
- Streamed to client in real-time
[AI Response Text] → [TTS] → [Lip Sync] → [Video Stream] → [Client]
Avatar Selection
HeyGen provides various avatar options:
- Public avatars - Pre-made professional avatars
- Custom avatars - Create from your own footage
- Photo avatars - Generate from a still image
Voice Selection
Each avatar supports multiple voices:
- Default voice - Matched to avatar appearance
- Custom voice - Clone a specific voice
- Third-party - Use ElevenLabs, etc.
Session Lifecycle
- Session created - Server-side, secure
- Client connects - WebSocket established
- Text sent - As AI generates responses
- Video streams - Real-time to client
- Session ends - On timeout or explicit close
Cost Considerations
Avatar streaming is typically the largest cost component:
| Session Duration | Cost (Scale tier) |
|---|---|
| 30 seconds (min) | $0.05 |
| 1 minute | $0.10 |
| 3 minutes | $0.30 |
| 5 minutes | $0.50 |
| 10 minutes | $1.00 |
Best Practices
Optimize Session Length
- End sessions when inactive
- Set reasonable timeouts
- Monitor usage in admin panel
Handle Interruptions
- Support user interruptions
- Cancel pending speech
- Smooth transition to new response