Generate natural-sounding speech from text. Clone voices, control emotions, and produce audio in dozens of languages.
MiniMax Speech 2.8 HD ranks #1 on TTS benchmarks, outperforming both OpenAI and ElevenLabs in blind evaluations. Studio-grade voice synthesis with 17+ preset voices, emotion control, voice cloning from just 5 seconds of audio, and support for 32+ languages. The best choice for voiceovers, audiobooks, and polished content.
ElevenLabs v3 delivers unprecedented expressiveness with audio tags like [excited], [whispers], and [sighs]. Supports 70+ languages and 26 voices. Requires more prompt engineering than other models but produces the most emotionally rich output. Great for film, audiobooks, and creative media.
Gemini 3.1 Flash TTS from Google gives you fine-grained control over delivery through inline tags and style prompting. Set a scene, define a character, and direct the performance — "you must hear the grin in the audio." 30 voices, 70+ languages, and natural-sounding output with rich expressiveness.
MiniMax Speech 2.8 Turbo is optimized for low-latency applications like voice agents, chatbots, and interactive experiences. Supports 40+ languages with the same voice cloning and emotion control as the HD version.
Inworld TTS 1.5 Mini achieves ~120ms latency — the fastest in this collection. Supports 15 languages with emotion markups and SSML break tags. Inworld TTS 1.5 Max trades a bit of speed for higher quality at <200ms latency.
Chatterbox from Resemble AI excels at voice cloning with emotional control — generate distinct character voices from just a few seconds of reference audio. Great for games, animations, and storytelling.
ElevenLabs v2 Multilingual generates speech in 29 languages while maintaining consistent voice quality across all of them. Good for localization workflows where the same voice needs to work in multiple languages.
Tortoise TTS is an open-source option that produces high-quality speech. Slower than the commercial models but fully self-hostable.
Featured models

Minimax Speech 2.8 HD focuses on high-fidelity audio generation with features like studio-grade quality, flexible emotion control, multilingual support, and voice cloning capabilities
Updated 1 day, 17 hours ago
69.6K runs

Minimax Speech 2.8 Turbo: Turn text into natural, expressive speech with voice cloning, emotion control, and support for 40+ languages
Updated 1 day, 17 hours ago
98.1K runs

Google's fast, expressive text-to-speech model with 30 voices and 70+ language support
Updated 4 days, 3 hours ago
5.4K runs

Highest-quality text-to-speech with <200ms latency, emotion control, and 15-language support
Updated 6 days, 1 hour ago
52.8K runs

Ultra-fast, cost-efficient text-to-speech with ~120ms latency and 15-language support
Updated 6 days, 1 hour ago
18.2K runs

The fastest open source TTS model without sacrificing quality.
Updated 4 months, 1 week ago
293.7K runs

The most expressive Text to Speech model
Updated 5 months, 4 weeks ago
35.3K runs

Generate expressive, natural speech. Features unique emotion control, instant voice cloning from short audio, and built-in watermarking.
Updated 10 months ago
272.8K runs
Recommended Models
minimax/speech-2.8-hd is the best overall TTS model — it ranks #1 on benchmarks, supports 32+ languages, and includes voice cloning and emotion control. For real-time applications, use the turbo variant minimax/speech-2.8-turbo.
inworld/tts-1.5-mini achieves ~120ms latency — the fastest in this collection. minimax/speech-2.8-turbo is also designed for low-latency real-time use. Both are great for chatbots, voice agents, and interactive apps.
elevenlabs/v3 produces the most emotionally rich speech. It supports audio tags like [excited], [whispers], and [sighs] for fine-grained control. Requires more prompt engineering but delivers the best results for audiobooks, film, and creative media.
minimax/speech-2.8-hd and minimax/speech-2.8-turbo both support voice cloning from just 5 seconds of reference audio. resemble-ai/chatterbox is another option with emotional control, especially good for character voices in games and animation.
elevenlabs/v3 supports 70+ languages. minimax/speech-2.8-turbo and minimax/speech-2.8-hd support 40+ and 32+ languages respectively. elevenlabs/v2-multilingual supports 29 languages with consistent voice quality across all of them.
Yes — most modern TTS models support emotion control. MiniMax models support happy, sad, angry, fearful, calm, and other emotions. ElevenLabs v3 uses audio tags for finer control. inworld/tts-1.5-max and inworld/tts-1.5-mini support emotion markups like [happy], [sad], plus non-verbal sounds like [laugh] and [sigh].
afiaka87/tortoise-tts is open-source and produces high-quality speech. It's slower than commercial models but can be self-hosted on your own hardware.
Most models support commercial use. Some may include audio watermarking — check each model's license page for specifics, especially regarding voice cloning and redistribution.
Recommended Models

Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Optimized for high-fidelity applications like voiceovers and audiobooks.
Updated 1 day, 12 hours ago
2.2M runs

MiniMax Speech 2.6 HD delivers studio-quality multilingual text-to-audio on Replicate with nuanced prosody, subtitle export, and premium voices
Updated 1 day, 12 hours ago
177.3K runs

Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Designed for real-time applications with low latency
Updated 1 day, 17 hours ago
11.9M runs

Low‑latency MiniMax Speech 2.6 Turbo brings multilingual, emotional text-to-speech to Replicate with 300+ voices and real-time friendly pricing
Updated 1 day, 17 hours ago
634.9K runs

A unified Text-to-Speech demo featuring three powerful modes: Voice, Clone and Design
Updated 2 weeks, 5 days ago
278.4K runs

Clone voices to use with Minimax's speech-02-hd and speech-02-turbo
Updated 5 months, 2 weeks ago
61.6K runs

High quality, low latency text to speech in 32 languages
Updated 5 months, 4 weeks ago
23.2K runs

Generate multilingual text-to-speech audio in over 30 languages
Updated 5 months, 4 weeks ago
9.2K runs

ElevenLabs's fastest speech synthesis model
Updated 5 months, 4 weeks ago
12.1K runs

Generate expressive, natural speech in 23 languages. Features instant voice cloning from short audio, emotion control, and seamless cross-language voice transfer.
Updated 7 months, 2 weeks ago
62.4K runs

zsxkib/diaDia 1.6B by Nari Labs, Generates realistic dialogue audio from text, including non-verbal cues and voice cloning
Updated 9 months, 1 week ago
14.5K runs

Generate expressive, natural speech with Resemble AI's Chatterbox.
Updated 10 months ago
18.8K runs

lucataco/csm-1bCSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs
Updated 1 year, 1 month ago
1.2K runs

lucataco/orpheus-3b-0.1-ftOrpheus 3B - high quality, emotive Text to Speech
Updated 1 year, 1 month ago
34.8K runs

cjwbw/voicecraftZero-Shot Speech Editing and Text-to-Speech in the Wild
Updated 1 year, 1 month ago
10.9K runs

jaaari/kokoro-82mKokoro v1.0 - text-to-speech (82M params, based on StyleTTS2)
Updated 1 year, 2 months ago
88.4M runs

A F5-TTS fine-tuned for Spanish
Updated 1 year, 5 months ago
1.5K runs

F5-TTS, the new state-of-the-art in open source voice cloning
Updated 1 year, 6 months ago
43.1K runs

platform-kit/mars5-ttsA novel speech model for insane prosody.
Updated 1 year, 9 months ago
546 runs

chenxwh/openvoiceUpdated to OpenVoice v2: Versatile Instant Voice Cloning
Updated 1 year, 11 months ago
85.8K runs

cjwbw/parler-ttslightweight text-to-speech (TTS) model, trained on 10.5K hours of audio data
Updated 2 years ago
2.8K runs

adirik/styletts2Generates speech from text
Updated 2 years, 2 months ago
132.5K runs

lucataco/phemePheme generates a variety of conversational voices in 16 kHz for phone-call applications
Updated 2 years, 3 months ago
578 runs

lucataco/xtts-v2Coqui XTTS-v2: Multilingual Text To Speech Voice Cloning
Updated 2 years, 4 months ago
5.8M runs

zsxkib/realistic-voice-cloningCreate song covers with any RVC v2 trained AI voice from audio files.
Updated 2 years, 5 months ago
1.7M runs

cjwbw/seamless_communicationSeamlessM4T—Massively Multilingual & Multimodal Machine Translation
Updated 2 years, 7 months ago
107.7K runs

awerks/neon-ttsNeonAI Coqui AI TTS Plugin.
Updated 2 years, 8 months ago
200.4K runs

suno-ai/bark🔊 Text-Prompted Generative Audio Model
Updated 2 years, 11 months ago
307.2K runs

afiaka87/tortoise-ttsGenerate speech from text, clone voices from mp3 files. From James Betker AKA "neonbjb".
Updated 3 years, 8 months ago
173.5K runs