HeyGen Avatar V
Generate realistic talking avatar videos from text using HeyGen’s Avatar V engine — the newest generation of HeyGen’s avatar engine, featuring cross-reference-driven animation that analyzes the avatar and audio together for more natural motion and lip-sync than Avatar IV.
When to use Avatar V vs Avatar IV
- Avatar V (this model) — Newer engine, better motion quality, more coherent expressions, especially on longer scripts. Only works with avatars trained for Avatar V.
- Avatar IV — Works with any HeyGen avatar including arbitrary photos. Use this if your avatar isn’t Avatar V-eligible, or if you need
motion_prompt/expressivenesscontrols.
Inputs
- avatar_id — The avatar to use. Must support Avatar V — not every avatar does. Inspect
supported_api_engineson the look viaGET /v3/avatars/looks/{id}before requesting it. - input_text — The script the avatar will speak (up to 5,000 characters).
- voice_id — The voice to use. Get IDs from HeyGen’s List All Voices API.
- resolution — Output resolution:
720p,1080p, or4k. - aspect_ratio —
16:9(landscape) or9:16(portrait). - voice_speed — Speech speed from 0.5× to 1.5×.
- caption — Whether to burn captions into the video.
- title — Optional video title shown in your HeyGen dashboard.
Output
Returns an MP4 video of the avatar speaking the provided text.
Use cases
- Marketing — Personalized video ads with high-quality avatars.
- Education — Multilingual course videos with consistent presenters.
- Sales — Personalized outreach videos at scale.
- Social media — Talking-head content without a camera.
Links
Model created
Model updated