MiniMax Speech 2.6 Turbo on Replicate

Models

Speech-2.6-HD: Next-generation high-definition model with improved realism and expressive control
Speech-2.6-Turbo: Enhanced low-latency model optimized for live and interactive applications
Speech-02-HD: Optimized for high-fidelity applications like voiceovers and audiobooks
Speech-02-Turbo: Designed for real-time applications with low latency
Voice-Cloning: Clone voices for use with speech-02-hd and speech-02-turbo

MiniMax Speech 2.6 Turbo is the newest MiniMax text-to-audio (T2A) model, tuned for real-time applications that need expressive voices, fast turnaround, and worldwide language coverage. Deploy it on Replicate to synthesize natural speech in seconds, experiment in the playground, or orchestrate it inside your own apps with Replicate’s REST API.

Why choose the Turbo variant?

⚡ Low latency, production-ready – optimized for chat agents, live voice bots, and interactive UI feedback loops.
🌍 40+ languages & dialect boosts – switch seamlessly between English, Chinese, Japanese, Spanish, Portuguese, Korean, and more.
🎭 Auto or manual emotions – let the model infer tone (“auto”) or pick from happy, calm, surprised, etc.
🗣 300+ curated voices + cloned voices – use MiniMax’s library or plug in your own voice IDs trained via minimax/voice-cloning.
💸 Predictable billing – $0.06 per 1,000 input tokens (one MiniMax token ≈ one character), with zero cost for audio outputs.

Migrating from Speech 2.0 Turbo?
The API schema is drop-in compatible. You’ll immediately notice richer prosody and improved multilingual pronunciation. Pricing is 4× higher per character than 2.0 Turbo, so plan your rollout and communications accordingly.

Quick start

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  https://api.replicate.com/v1/predictions \
  -d '{
    "version": "latest",
    "input": {
      "text": "Replicate, meet MiniMax Speech 2.6 Turbo.",
      "voice_id": "English_Insightful_Speaker",
      "speed": 1.0,
      "emotion": "auto",
      "audio_format": "mp3"
    }
  }'

Outputs are downloadable audio files (Replicate hosts them for 24 hours by default).

Input parameters

Name	Type	Default	Description
`text`	string	–	Text to narrate. Up to 10 000 characters. Pause with `<#0.5#>` style markers.
`voice_id`	string	`Wise_Woman`	Any MiniMax system voice or cloned voice ID.
`speed`	float	`1.0`	Range 0.5–2.0.
`volume`	float	`1.0`	Range 0–10.
`pitch`	int	`0`	Semitone shift from −12 to +12.
`emotion`	string	`auto`	`auto`, or choose `happy`, `sad`, `angry`, `fearful`, `disgusted`, `surprised`, `calm`, `fluent`, `neutral`.
`english_normalization`	bool	`false`	Improves number/date reading in English text.
`sample_rate`	int	`32000`	8000–44100 Hz.
`bitrate`	int	`128000`	32000, 64000, 128000, or 256000 (MP3 only).
`audio_format`	string	`mp3`	`mp3`, `wav`, `flac`, or `pcm`.
`channel`	string	`mono`	`mono` or `stereo`.
`subtitle_enable`	bool	`false`	Return subtitle metadata (MiniMax provides sentence-level timestamps).
`language_boost`	string	`Null`	Boost recognition for any of the 40 supported languages (e.g. `English`, `Thai`, `Portuguese`, `Afrikaans`) or set `Automatic`.

Output

Replicate returns a hosted audio file (mp3, wav, flac, or pcm) plus metadata including MiniMax’s character count, audio duration, and optional subtitles when enabled.

Pricing on Replicate

$0.06 per 1,000 input tokens (token_input_count)
$0.00 per output token (audio is billed only on the input side)

One MiniMax “token” is roughly a character, so a 120‑character sentence costs about \$0.0072.

Use cases

Real-time conversational agents with lifelike vocal responses
Interactive tutorials or customer support handoffs
Audio UI/UX prompts and brand voices
Multilingual IVR and in-product voice localization

Helpful links

MiniMax Speech T2A docs: https://platform.minimax.io/docs/api-reference/speech-t2a-intro
MiniMax voice list: https://platform.minimax.io/docs/faq/system-voice-id
MiniMax privacy policy: https://intl.minimaxi.com/protocol/privacy-policy
MiniMax terms of service: https://intl.minimaxi.com/protocol/terms-of-service

Need higher fidelity narration or post-production headroom? Check out the HD sibling model below.

Model created 4 months, 1 week ago

Model updated 4 months ago