Generate natural-sounding speech from text with these powerful models. Clone your own voice or pick from a variety of languages and speaking styles.
Featured models

minimax/speech-02-turbo
Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Designed for real-time applications with low latency
Updated 1 week, 2 days ago
5.5M runs

minimax/speech-02-hd
Text-to-Audio (T2A) that offers voice synthesis, emotional expression, and multilingual capabilities. Optimized for high-fidelity applications like voiceovers and audiobooks.
Updated 1 week, 2 days ago
1M runs

resemble-ai/chatterbox-multilingual
Generate expressive, natural speech in 23 languages. Features instant voice cloning from short audio, emotion control, and seamless cross-language voice transfer.
Updated 2 months, 1 week ago
3.3K runs

resemble-ai/chatterbox
Generate expressive, natural speech. Features unique emotion control, instant voice cloning from short audio, and built-in watermarking.
Updated 4 months, 3 weeks ago
157.8K runs

resemble-ai/chatterbox-pro
Generate expressive, natural speech with Resemble AI's Chatterbox.
Updated 4 months, 4 weeks ago
15.2K runs


jaaari/kokoro-82m
Kokoro v1.0 - text-to-speech (82M params, based on StyleTTS2)
Updated 9 months, 2 weeks ago
58.8M runs
Recommended Models
If low latency matters most, minimax/speech-02-turbo is the standout model in the text-to-speech collection. It’s designed for near real-time audio generation, making it ideal for interactive experiences like chatbots, voice assistants, and in-game dialogue.
Higher-fidelity models like afiaka87/tortoise-tts are slower and better suited for offline rendering or projects where speed isn’t critical.
minimax/speech-02-hd is a strong all-around option in the text-to-speech collection. It provides clear, natural voices with expressive control and reasonable generation time.
Open-source options like afiaka87/tortoise-tts may be more cost-efficient to self-host, but they’re slower and less predictable in performance.
For polished audio content like voiceovers, podcasts, audiobooks, or narration, minimax/speech-02-hd is a great fit. It supports expressive delivery, natural pacing, and multiple languages. If you need finer emotional control or unique character voices, resemble-ai/chatterbox also performs well.
For applications where speed is essential—like voice-enabled apps, live interactions, or game characters—minimax/speech-02-turbo is the best match. It prioritizes fast generation and low latency while maintaining solid audio clarity.
For projects like games, animations, or storytelling, resemble-ai/chatterbox excels. It supports emotion control and fast voice cloning, letting you generate distinct character voices from just a few seconds of reference audio.
Most text-to-speech models return audio files, typically in MP3 format. Some also support WAV. Output voice options and supported languages vary by model, so check the model page for specifics.
Open-source models like afiaka87/tortoise-tts can be self-hosted with standard tooling. If you want to publish your own model on Replicate, package it with the required files and configuration and push it from your account.
Many models in the text-to-speech collection support commercial use, but always check the license. Some models include watermarking or usage restrictions that may affect how you use the audio in commercial projects.
Recommended Models

minimax/voice-cloning
Clone voices to use with Minimax's speech-02-hd and speech-02-turbo
Updated 1 week, 2 days ago
20.4K runs


zsxkib/dia
Dia 1.6B by Nari Labs, Generates realistic dialogue audio from text, including non-verbal cues and voice cloning
Updated 4 months ago
9.7K runs


lucataco/csm-1b
CSM (Conversational Speech Model) is a speech generation model from Sesame that generates RVQ audio codes from text and audio inputs
Updated 7 months, 3 weeks ago
991 runs


lucataco/orpheus-3b-0.1-ft
Orpheus 3B - high quality, emotive Text to Speech
Updated 7 months, 3 weeks ago
30.8K runs


cjwbw/voicecraft
Zero-Shot Speech Editing and Text-to-Speech in the Wild
Updated 8 months ago
10.6K runs


fermatresearch/spanish-f5-tts
A F5-TTS fine-tuned for Spanish
Updated 1 year ago
1K runs

x-lance/f5-tts
F5-TTS, the new state-of-the-art in open source voice cloning
Updated 1 year, 1 month ago
33.8K runs


platform-kit/mars5-tts
A novel speech model for insane prosody.
Updated 1 year, 4 months ago
524 runs


chenxwh/openvoice
Updated to OpenVoice v2: Versatile Instant Voice Cloning
Updated 1 year, 5 months ago
78.5K runs


cjwbw/parler-tts
lightweight text-to-speech (TTS) model, trained on 10.5K hours of audio data
Updated 1 year, 7 months ago
2.7K runs


adirik/styletts2
Generates speech from text
Updated 1 year, 9 months ago
132K runs


lucataco/pheme
Pheme generates a variety of conversational voices in 16 kHz for phone-call applications
Updated 1 year, 10 months ago
561 runs


lucataco/xtts-v2
Coqui XTTS-v2: Multilingual Text To Speech Voice Cloning
Updated 1 year, 11 months ago
4.5M runs


zsxkib/realistic-voice-cloning
Create song covers with any RVC v2 trained AI voice from audio files.
Updated 2 years ago
1.2M runs


cjwbw/seamless_communication
SeamlessM4T—Massively Multilingual & Multimodal Machine Translation
Updated 2 years, 2 months ago
90K runs


awerks/neon-tts
NeonAI Coqui AI TTS Plugin.
Updated 2 years, 3 months ago
165.3K runs


suno-ai/bark
🔊 Text-Prompted Generative Audio Model
Updated 2 years, 6 months ago
302.4K runs


afiaka87/tortoise-tts
Generate speech from text, clone voices from mp3 files. From James Betker AKA "neonbjb".
Updated 3 years, 3 months ago
172.8K runs