Official model
ID
fah37y17x9rme0cpmqt973z6dr
Status
Succeeded
Source
Web
Total duration
Created
by minimax
Webhook

Input

text
<#0.7#>An Introduction to Minimax Speech-02 <#0.7#> Minimax's Speech-02 series are text-to-speech models that create natural-sounding voices with emotional expression. These models support more than 30 languages. According to the Artificial Analysis Speech Arena, Speech-02-HD is currently rated as the best text-to-speech model available today, while Speech-02-Turbo ranks third. With Replicate's platform, you can access these powerful models easily. <#0.7#> Model Options <#0.7#> You can choose between two models: Speech-02-HD: Designed for high-quality voiceovers and audiobooks when premium audio quality matters. Speech-02-Turbo: A more affordable option that processes faster, making it ideal for real-time applications. Both models can work with cloned voices. Voice cloning requires at least 10 seconds of audio and takes approximately 30 seconds to train. Each voice can be customized with adjustments to pitch, speed, and volume to achieve a natural sound. These models are available through Replicate's platform, where you can try them in an interactive playground. <#0.7#> Potential Applications <#0.7#> With these text-to-speech models, you can create: Virtual assistants with natural-sounding voices, studio-quality audiobooks and voiceovers, language learning tools featuring native pronunciation, multilingual customer service bots, and audio content that improves accessibility. <#0.7#> Emotion Control Features <#0.7#> Minimax's emotion control system offers two approaches for adding feeling to voices. The auto-detect mode automatically determines the appropriate emotional tone based on your text. Alternatively, manual controls allow you to specify exactly which emotion you want to convey. This flexibility helps your voices sound natural and engaging across various use cases, whether for entertainment, education, or business purposes. <#0.7#> Language Support <#0.7#> The models support more than 30 languages and accents. You can work with various English variants including US, UK, Australian, and Indian English. Asian language support includes Mandarin, Cantonese, Japanese, Korean, Vietnamese, and Indonesian. European languages like French, German, Spanish, Portuguese, Turkish, Russian, and Ukrainian are also supported. <#0.7#> Using the API <#0.7#> You can run these models using either JavaScript or Python with Replicate's client libraries. The process involves two main steps: first cloning a voice using an audio sample, then using that cloned voice for text-to-speech generation. To get started, you'll need to obtain an API token from your Replicate account. Once set up, you can clone voices using audio files in MP3, M4A, or WAV format. These files should be between 10 seconds and 5 minutes long and less than 20MB in size. After cloning a voice, you can use the generated voice ID to create text-to-speech with your preferred emotional style. <#0.7#> Pricing Information <#0.7#> The text-to-speech models are priced based on input and output tokens, where one token equals one character. The turbo model costs $30 per million characters, while the HD model costs $50 per million characters. Voice cloning has a separate cost of $3 per voice. <#0.7#> Stay Connected <#0.7#> To keep up with the latest developments, you can follow Replicate on their social media channels and join their Discord community for updates and discussions. Happy creating with these powerful text-to-speech capabilities!
voice_id
Wise_Woman
speed
1.15
volume
1
pitch
0
emotion
happy
english_normalization
true
sample_rate
32000
bitrate
128000
channel
mono
language_boost
English

Output

Generated in
Input tokens
3.4K
Output tokens
1
Tokens per second
0.20 tokens / second
Time to first token