elevenlabs/v3

The most expressive Text to Speech model

356 runs

Eleven v3 (alpha) — the most expressive Text to Speech model.

This research preview brings unprecedented control and realism to speech generation with:

70+ languages Multi-speaker dialogue Audio tags like [excited], [whispers], and [sighs] Eleven v3 (alpha) requires more prompt engineering than previous models — but the generations are breathtaking.

If you’re working on videos, audiobooks, or media tools — this unlocks a new level of expressiveness. For real-time and conversational use cases, we recommend staying with v2.5 Turbo or Flash for now. A real-time version of v3 is in development.