Generate natural-sounding speech from text with these powerful models. Clone your own voice or pick from a variety of languages and speaking styles.
Featured models

Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Designed for real-time applications with low latency
Updated 4 weeks, 1 day ago
6.4M runs

Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Optimized for high-fidelity applications like voiceovers and audiobooks.
Updated 4 weeks, 1 day ago
1.1M runs

Generate expressive, natural speech in 23 languages. Features instant voice cloning from short audio, emotion control, and seamless cross-language voice transfer.
Updated 3 months ago
4.3K runs

Generate expressive, natural speech. Features unique emotion control, instant voice cloning from short audio, and built-in watermarking.
Updated 5 months, 2 weeks ago
181.8K runs

Generate expressive, natural speech with Resemble AI's Chatterbox.
Updated 5 months, 2 weeks ago
15.6K runs

jaaari/kokoro-82mKokoro v1.0 - text-to-speech (82M params, based on StyleTTS2)
Updated 10 months, 1 week ago
65.6M runs
Recommended Models
If low latency matters most, minimax/speech-02-turbo is the standout model in the text-to-speech collection. It’s designed for near real-time audio generation, making it ideal for interactive experiences like chatbots, voice assistants, and in-game dialogue.
Higher-fidelity models like afiaka87/tortoise-tts are slower and better suited for offline rendering or projects where speed isn’t critical.
minimax/speech-02-hd is a strong all-around option in the text-to-speech collection. It provides clear, natural voices with expressive control and reasonable generation time.
Open-source options like afiaka87/tortoise-tts may be more cost-efficient to self-host, but they’re slower and less predictable in performance.
For polished audio content like voiceovers, podcasts, audiobooks, or narration, minimax/speech-02-hd is a great fit. It supports expressive delivery, natural pacing, and multiple languages. If you need finer emotional control or unique character voices, resemble-ai/chatterbox also performs well.
For applications where speed is essential—like voice-enabled apps, live interactions, or game characters—minimax/speech-02-turbo is the best match. It prioritizes fast generation and low latency while maintaining solid audio clarity.
For projects like games, animations, or storytelling, resemble-ai/chatterbox excels. It supports emotion control and fast voice cloning, letting you generate distinct character voices from just a few seconds of reference audio.
Most text-to-speech models return audio files, typically in MP3 format. Some also support WAV. Output voice options and supported languages vary by model, so check the model page for specifics.
Open-source models like afiaka87/tortoise-tts can be self-hosted with standard tooling. If you want to publish your own model on Replicate, package it with the required files and configuration and push it from your account.
Many models in the text-to-speech collection support commercial use, but always check the license. Some models include watermarking or usage restrictions that may affect how you use the audio in commercial projects.
Recommended Models

Clone voices to use with Minimax's speech-02-hd and speech-02-turbo
Updated 4 weeks, 1 day ago
23K runs

zsxkib/diaDia 1.6B by Nari Labs, Generates realistic dialogue audio from text, including non-verbal cues and voice cloning
Updated 4 months, 3 weeks ago
9.9K runs

lucataco/csm-1bCSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs
Updated 8 months, 2 weeks ago
1.1K runs

lucataco/orpheus-3b-0.1-ftOrpheus 3B - high quality, emotive Text to Speech
Updated 8 months, 2 weeks ago
32.3K runs

cjwbw/voicecraftZero-Shot Speech Editing and Text-to-Speech in the Wild
Updated 8 months, 3 weeks ago
10.7K runs

fermatresearch/spanish-f5-ttsA F5-TTS fine-tuned for Spanish
Updated 1 year ago
1.1K runs

F5-TTS, the new state-of-the-art in open source voice cloning
Updated 1 year, 1 month ago
35.9K runs

platform-kit/mars5-ttsA novel speech model for insane prosody.
Updated 1 year, 5 months ago
526 runs

chenxwh/openvoiceUpdated to OpenVoice v2: Versatile Instant Voice Cloning
Updated 1 year, 6 months ago
79.8K runs

cjwbw/parler-ttslightweight text-to-speech (TTS) model, trained on 10.5K hours of audio data
Updated 1 year, 7 months ago
2.7K runs

adirik/styletts2Generates speech from text
Updated 1 year, 10 months ago
132K runs

lucataco/phemePheme generates a variety of conversational voices in 16 kHz for phone-call applications
Updated 1 year, 10 months ago
563 runs

lucataco/xtts-v2Coqui XTTS-v2: Multilingual Text To Speech Voice Cloning
Updated 2 years ago
4.6M runs

zsxkib/realistic-voice-cloningCreate song covers with any RVC v2 trained AI voice from audio files.
Updated 2 years ago
1.3M runs

cjwbw/seamless_communicationSeamlessM4T—Massively Multilingual & Multimodal Machine Translation
Updated 2 years, 2 months ago
91.1K runs

awerks/neon-ttsNeonAI Coqui AI TTS Plugin.
Updated 2 years, 3 months ago
169.8K runs

suno-ai/bark🔊 Text-Prompted Generative Audio Model
Updated 2 years, 7 months ago
302.9K runs

afiaka87/tortoise-ttsGenerate speech from text, clone voices from mp3 files. From James Betker AKA "neonbjb".
Updated 3 years, 4 months ago
172.9K runs