elevenlabs/turbo-v2.5

High quality, low latency text to speech in 32 languages

77 runs

Readme

elevenlabs/turbo-v2.5

High quality, low latency text-to-speech in 32 languages

🎯 Overview

Eleven Turbo v2.5 is ElevenLabs’ high-quality, low-latency text-to-speech (TTS) model, delivering a perfect balance of quality and speed.

  • Latency: ~250-300ms – ideal for real-time applications.
  • Max Length: 40,000 characters per request (~40 minutes of audio).
  • Performance: 3x faster than Multilingual v2 in most languages; 25% faster in English.
  • New Languages: Adds Vietnamese, Hungarian, and Norwegian for the first time.

Perfect for voice agents, chatbots, interactive apps, and any scenario needing non-English support with premium quality.

🌍 Supported Languages (32 total)

Language Code Language
en English
ja Japanese
zh Mandarin Chinese
de German
hi Hindi
fr French
ko Korean
pt Portuguese
it Italian
es Spanish
id Indonesian
nl Dutch
tr Turkish
fil Filipino
pl Polish
sv Swedish
bg Bulgarian
ro Romanian
ar Arabic
cs Czech
el Greek
fi Finnish
hr Croatian
ms Malay
sk Slovak
da Danish
ta Tamil
uk Ukrainian
ru Russian
hu Hungarian
no Norwegian
vi Vietnamese

🔧 Inputs

Parameter Type Required Default Description
text string ✅ Yes - The text to convert to speech. Max 40,000 characters.
voice string ✅ Yes pNInz6obpgDQGcFmaJgB (Adam) Voice ID or name (e.g., “Adam”, “Rachel”). Full voice library.
stability number No 0.5 Controls voice consistency (0.0 = variable, 1.0 = consistent). 0-1.
similarity_boost number No 0.75 Boosts similarity to the original voice (0-1).
style number No 0 Exaggerates the described style (0-1).
use_speaker_boost boolean No true Applies speaker boost for better clarity.
optimize_streaming_latency number No 0 Optimize for streaming (0-4; higher = lower latency).
output_format string No mp3_44100_128 Output audio format (e.g., mp3_44100_128, pcm_16000, wav).
seed integer No - Random seed for reproducibility.

🎵 Outputs

  • Audio: Base64-encoded MP3 (default) or other format audio file.
  • Content-Type: audio/mpeg or specified format.

🚀 Use Cases

  • Conversational AI: Real-time voice responses in apps and agents.
  • Multilingual Content: Narration, dubbing, and localization.
  • Interactive Media: Games, virtual assistants, audiobooks.
  • Enterprise: High-volume, low-latency production.

Pro Tip: Trade-off between Flash v2.5 (ultra-low latency) and Multilingual v2 (highest quality) – Turbo v2.5 is the sweet spot.