Readme
MiniMax Speech 2.6 Turbo on Replicate
Models
- Speech-2.6-HD: Next-generation high-definition model with improved realism and expressive control
- Speech-2.6-Turbo: Enhanced low-latency model optimized for live and interactive applications
- Speech-02-HD: Optimized for high-fidelity applications like voiceovers and audiobooks
- Speech-02-Turbo: Designed for real-time applications with low latency
- Voice-Cloning: Clone voices for use with speech-02-hd and speech-02-turbo
MiniMax Speech 2.6 Turbo is the newest MiniMax text-to-audio (T2A) model, tuned for real-time applications that need expressive voices, fast turnaround, and worldwide language coverage. Deploy it on Replicate to synthesize natural speech in seconds, experiment in the playground, or orchestrate it inside your own apps with Replicate’s REST API.
Why choose the Turbo variant?
- ⚡ Low latency, production-ready – optimized for chat agents, live voice bots, and interactive UI feedback loops.
- 🌍 40+ languages & dialect boosts – switch seamlessly between English, Chinese, Japanese, Spanish, Portuguese, Korean, and more.
- 🎭 Auto or manual emotions – let the model infer tone (“auto”) or pick from happy, calm, surprised, etc.
- 🗣 300+ curated voices + cloned voices – use MiniMax’s library or plug in your own voice IDs trained via
minimax/voice-cloning. - 💸 Predictable billing – $0.06 per 1,000 input tokens (one MiniMax token ≈ one character), with zero cost for audio outputs.
Migrating from Speech 2.0 Turbo?
The API schema is drop-in compatible. You’ll immediately notice richer prosody and improved multilingual pronunciation. Pricing is 4× higher per character than 2.0 Turbo, so plan your rollout and communications accordingly.
Quick start
curl -s -X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
https://api.replicate.com/v1/predictions \
-d '{
"version": "latest",
"input": {
"text": "Replicate, meet MiniMax Speech 2.6 Turbo.",
"voice_id": "English_Insightful_Speaker",
"speed": 1.0,
"emotion": "auto",
"audio_format": "mp3"
}
}'
Outputs are downloadable audio files (Replicate hosts them for 24 hours by default).
Input parameters
| Name | Type | Default | Description |
|---|---|---|---|
text |
string | – | Text to narrate. Up to 10 000 characters. Pause with <#0.5#> style markers. |
voice_id |
string | Wise_Woman |
Any MiniMax system voice or cloned voice ID. |
speed |
float | 1.0 |
Range 0.5–2.0. |
volume |
float | 1.0 |
Range 0–10. |
pitch |
int | 0 |
Semitone shift from −12 to +12. |
emotion |
string | auto |
auto, or choose happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutral. |
english_normalization |
bool | false |
Improves number/date reading in English text. |
sample_rate |
int | 32000 |
8000–44100 Hz. |
bitrate |
int | 128000 |
32000, 64000, 128000, or 256000 (MP3 only). |
audio_format |
string | mp3 |
mp3, wav, flac, or pcm. |
channel |
string | mono |
mono or stereo. |
subtitle_enable |
bool | false |
Return subtitle metadata (MiniMax provides sentence-level timestamps). |
language_boost |
string | Null |
Boost recognition for any of the 40 supported languages (e.g. English, Thai, Portuguese, Afrikaans) or set Automatic. |
Output
Replicate returns a hosted audio file (mp3, wav, flac, or pcm) plus metadata including MiniMax’s character count, audio duration, and optional subtitles when enabled.
Pricing on Replicate
- $0.06 per 1,000 input tokens (
token_input_count) - $0.00 per output token (audio is billed only on the input side)
One MiniMax “token” is roughly a character, so a 120‑character sentence costs about \$0.0072.
Use cases
- Real-time conversational agents with lifelike vocal responses
- Interactive tutorials or customer support handoffs
- Audio UI/UX prompts and brand voices
- Multilingual IVR and in-product voice localization
Helpful links
- MiniMax Speech T2A docs: https://platform.minimax.io/docs/api-reference/speech-t2a-intro
- MiniMax voice list: https://platform.minimax.io/docs/faq/system-voice-id
- MiniMax privacy policy: https://intl.minimaxi.com/protocol/privacy-policy
- MiniMax terms of service: https://intl.minimaxi.com/protocol/terms-of-service
Need higher fidelity narration or post-production headroom? Check out the HD sibling model below.