heygen/avatar-v

Create realistic talking avatar videos from text with HeyGen's Avatar V engine — the newest, highest-quality avatar engine with cross-reference-driven animation.

8 runs

Readme

HeyGen Avatar V

Generate realistic talking avatar videos from text using HeyGen’s Avatar V engine — the newest generation of HeyGen’s avatar engine, featuring cross-reference-driven animation that analyzes the avatar and audio together for more natural motion and lip-sync than Avatar IV.

When to use Avatar V vs Avatar IV

  • Avatar V (this model) — Newer engine, better motion quality, more coherent expressions, especially on longer scripts. Only works with avatars trained for Avatar V.
  • Avatar IV — Works with any HeyGen avatar including arbitrary photos. Use this if your avatar isn’t Avatar V-eligible, or if you need motion_prompt/expressiveness controls.

Inputs

  • avatar_id — The avatar to use. Must support Avatar V — not every avatar does. Inspect supported_api_engines on the look via GET /v3/avatars/looks/{id} before requesting it.
  • input_text — The script the avatar will speak (up to 5,000 characters).
  • voice_id — The voice to use. Get IDs from HeyGen’s List All Voices API.
  • resolution — Output resolution: 720p, 1080p, or 4k.
  • aspect_ratio16:9 (landscape) or 9:16 (portrait).
  • voice_speed — Speech speed from 0.5× to 1.5×.
  • caption — Whether to burn captions into the video.
  • title — Optional video title shown in your HeyGen dashboard.

Output

Returns an MP4 video of the avatar speaking the provided text.

Use cases

  • Marketing — Personalized video ads with high-quality avatars.
  • Education — Multilingual course videos with consistent presenters.
  • Sales — Personalized outreach videos at scale.
  • Social media — Talking-head content without a camera.
Model created
Model updated