ElevenLabs Flash v2.5 is the fastest speech synthesis model from ElevenLabs, designed for real-time applications and conversational AI. It delivers high-quality speech with ultra-low latency (~75ms) across 32 languages.
Flash v2.5 balances speed and quality, making it ideal for interactive applications while maintaining natural-sounding output.
Key features
- Ultra-low latency: ~75ms, perfect for real-time voice agents and chatbots
- 32 languages: All languages from Multilingual v2 plus Hungarian, Norwegian, and Vietnamese
- 40,000 character limit: Generate up to ~40 minutes of audio per request
- 50% lower price per character compared to Multilingual v2
Supported languages (32)
| Code | Language | Code | Language |
|---|---|---|---|
en |
English | pl |
Polish |
ja |
Japanese | sv |
Swedish |
zh |
Mandarin Chinese | bg |
Bulgarian |
de |
German | ro |
Romanian |
hi |
Hindi | ar |
Arabic |
fr |
French | cs |
Czech |
ko |
Korean | el |
Greek |
pt |
Portuguese | fi |
Finnish |
it |
Italian | hr |
Croatian |
es |
Spanish | ms |
Malay |
id |
Indonesian | sk |
Slovak |
nl |
Dutch | da |
Danish |
tr |
Turkish | ta |
Tamil |
fil |
Filipino | uk |
Ukrainian |
ru |
Russian | hu |
Hungarian |
no |
Norwegian | vi |
Vietnamese |
Inputs
| Parameter | Type | Default | Description |
|---|---|---|---|
prompt |
string | — | The text to convert to speech |
voice |
string | Rachel |
Voice choice for speech generation |
language_code |
string | en |
Language code (e.g., en, es, fr) |
stability |
number | 0.5 |
Voice consistency (0.0–1.0) |
similarity_boost |
number | 0.75 |
Similarity to the original voice (0.0–1.0) |
style |
number | 0 |
Style exaggeration (0.0–1.0) |
speed |
number | 1 |
Speed of speech (0.7–1.2) |
previous_text |
string | — | Previous text for context |
next_text |
string | — | Next text for context |
Use cases
- Voice agents and chatbots: Ultra-low latency makes it perfect for conversational AI
- Interactive apps: Games and applications that need immediate audio response
- Large-scale processing: Efficient for bulk text-to-speech conversion
- Multilingual content: Narration, dubbing, and localization across 32 languages
Choosing between ElevenLabs models
- Flash v2.5: Fastest (~75ms), best for real-time and cost-sensitive use cases
- Turbo v2.5: Balanced quality and speed (~250ms), same language and character support
- Multilingual v2: Highest quality, best for professional content and audiobooks
- v3: Most expressive, with 70+ languages and multi-speaker dialogue support
Model created