ElevenLabs Flash v2.5 text to speech API | Replicate

ElevenLabs Flash v2.5 is the fastest speech synthesis model from ElevenLabs, designed for real-time applications and conversational AI. It delivers high-quality speech with ultra-low latency (~75ms) across 32 languages.

Flash v2.5 balances speed and quality, making it ideal for interactive applications while maintaining natural-sounding output.

Key features

Ultra-low latency: ~75ms, perfect for real-time voice agents and chatbots
32 languages: All languages from Multilingual v2 plus Hungarian, Norwegian, and Vietnamese
40,000 character limit: Generate up to ~40 minutes of audio per request
50% lower price per character compared to Multilingual v2

Supported languages (32)

Code	Language	Code	Language
`en`	English	`pl`	Polish
`ja`	Japanese	`sv`	Swedish
`zh`	Mandarin Chinese	`bg`	Bulgarian
`de`	German	`ro`	Romanian
`hi`	Hindi	`ar`	Arabic
`fr`	French	`cs`	Czech
`ko`	Korean	`el`	Greek
`pt`	Portuguese	`fi`	Finnish
`it`	Italian	`hr`	Croatian
`es`	Spanish	`ms`	Malay
`id`	Indonesian	`sk`	Slovak
`nl`	Dutch	`da`	Danish
`tr`	Turkish	`ta`	Tamil
`fil`	Filipino	`uk`	Ukrainian
`ru`	Russian	`hu`	Hungarian
`no`	Norwegian	`vi`	Vietnamese

Inputs

Parameter	Type	Default	Description
`prompt`	string	—	The text to convert to speech
`voice`	string	`Rachel`	Voice choice for speech generation
`language_code`	string	`en`	Language code (e.g., `en`, `es`, `fr`)
`stability`	number	`0.5`	Voice consistency (0.0–1.0)
`similarity_boost`	number	`0.75`	Similarity to the original voice (0.0–1.0)
`style`	number	`0`	Style exaggeration (0.0–1.0)
`speed`	number	`1`	Speed of speech (0.7–1.2)
`previous_text`	string	—	Previous text for context
`next_text`	string	—	Next text for context

Use cases

Voice agents and chatbots: Ultra-low latency makes it perfect for conversational AI
Interactive apps: Games and applications that need immediate audio response
Large-scale processing: Efficient for bulk text-to-speech conversion
Multilingual content: Narration, dubbing, and localization across 32 languages

Choosing between ElevenLabs models

Flash v2.5: Fastest (~75ms), best for real-time and cost-sensitive use cases
Turbo v2.5: Balanced quality and speed (~250ms), same language and character support
Multilingual v2: Highest quality, best for professional content and audiobooks
v3: Most expressive, with 70+ languages and multi-speaker dialogue support

Model created 9 months ago