minimax/speech-2.6-turbo

Low‑latency MiniMax Speech 2.6 Turbo brings multilingual, emotional text-to-speech to Replicate with 300+ voices and real-time friendly pricing

233 runs

MiniMax Speech 2.6 Turbo on Replicate

Models

  • Speech-2.6-HD: Next-generation high-definition model with improved realism and expressive control
  • Speech-2.6-Turbo: Enhanced low-latency model optimized for live and interactive applications
  • Speech-02-HD: Optimized for high-fidelity applications like voiceovers and audiobooks
  • Speech-02-Turbo: Designed for real-time applications with low latency
  • Voice-Cloning: Clone voices for use with speech-02-hd and speech-02-turbo

MiniMax Speech 2.6 Turbo is the newest MiniMax text-to-audio (T2A) model, tuned for real-time applications that need expressive voices, fast turnaround, and worldwide language coverage. Deploy it on Replicate to synthesize natural speech in seconds, experiment in the playground, or orchestrate it inside your own apps with Replicate’s REST API.

Why choose the Turbo variant?

  • Low latency, production-ready – optimized for chat agents, live voice bots, and interactive UI feedback loops.
  • 🌍 40+ languages & dialect boosts – switch seamlessly between English, Chinese, Japanese, Spanish, Portuguese, Korean, and more.
  • 🎭 Auto or manual emotions – let the model infer tone (“auto”) or pick from happy, calm, surprised, etc.
  • 🗣 300+ curated voices + cloned voices – use MiniMax’s library or plug in your own voice IDs trained via minimax/voice-cloning.
  • 💸 Predictable billing – $0.06 per 1,000 input tokens (one MiniMax token ≈ one character), with zero cost for audio outputs.

Migrating from Speech 2.0 Turbo?
The API schema is drop-in compatible. You’ll immediately notice richer prosody and improved multilingual pronunciation. Pricing is 4× higher per character than 2.0 Turbo, so plan your rollout and communications accordingly.

Quick start

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  https://api.replicate.com/v1/predictions \
  -d '{
    "version": "latest",
    "input": {
      "text": "Replicate, meet MiniMax Speech 2.6 Turbo.",
      "voice_id": "English_Insightful_Speaker",
      "speed": 1.0,
      "emotion": "auto",
      "audio_format": "mp3"
    }
  }'

Outputs are downloadable audio files (Replicate hosts them for 24 hours by default).

Input parameters

Name Type Default Description
text string Text to narrate. Up to 10 000 characters. Pause with <#0.5#> style markers.
voice_id string Wise_Woman Any MiniMax system voice or cloned voice ID.
speed float 1.0 Range 0.5–2.0.
volume float 1.0 Range 0–10.
pitch int 0 Semitone shift from −12 to +12.
emotion string auto auto, or choose happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutral.
english_normalization bool false Improves number/date reading in English text.
sample_rate int 32000 8000–44100 Hz.
bitrate int 128000 32000, 64000, 128000, or 256000 (MP3 only).
audio_format string mp3 mp3, wav, flac, or pcm.
channel string mono mono or stereo.
subtitle_enable bool false Return subtitle metadata (MiniMax provides sentence-level timestamps).
language_boost string Null Boost recognition for any of the 40 supported languages (e.g. English, Thai, Portuguese, Afrikaans) or set Automatic.

Output

Replicate returns a hosted audio file (mp3, wav, flac, or pcm) plus metadata including MiniMax’s character count, audio duration, and optional subtitles when enabled.

Pricing on Replicate

  • $0.06 per 1,000 input tokens (token_input_count)
  • $0.00 per output token (audio is billed only on the input side)

One MiniMax “token” is roughly a character, so a 120‑character sentence costs about \$0.0072.

Use cases

  • Real-time conversational agents with lifelike vocal responses
  • Interactive tutorials or customer support handoffs
  • Audio UI/UX prompts and brand voices
  • Multilingual IVR and in-product voice localization

Need higher fidelity narration or post-production headroom? Check out the HD sibling model below.