Readme
Eleven v3 (alpha) — the most expressive Text to Speech model.
This research preview brings unprecedented control and realism to speech generation with:
70+ languages Multi-speaker dialogue Audio tags like [excited], [whispers], and [sighs] Eleven v3 (alpha) requires more prompt engineering than previous models — but the generations are breathtaking.
If you’re working on videos, audiobooks, or media tools — this unlocks a new level of expressiveness. For real-time and conversational use cases, we recommend staying with v2.5 Turbo or Flash for now. A real-time version of v3 is in development.