MiniMax Speech 2.6 HD on Replicate

Models

Speech-2.6-HD: Next-generation high-definition model with improved realism and expressive control
Speech-2.6-Turbo: Enhanced low-latency model optimized for live and interactive applications
Speech-02-HD: Optimized for high-fidelity applications like voiceovers and audiobooks
Speech-02-Turbo: Designed for real-time applications with low latency
Voice-Cloning: Clone voices for use with speech-02-hd and speech-02-turbo

MiniMax Speech 2.6 HD is the flagship text-to-audio model from MiniMax, tuned for premium voiceover work, audiobooks, marketing content, and any scenario that demands maximum fidelity and vocal nuance. It ships on Replicate with the same easy REST API as the Turbo model, plus full support for 40+ languages, 300+ voices, and custom voice cloning.

Why use the HD variant?

🎙 Studio-grade prosody – crisper articulation, better breath control, and smoother phrasing than 2.6 Turbo.
🧠 Emotion intelligence – “auto” matches the tone to your script, or pick precise emotions like calm, fluent, or surprised.
🌐 Global language coverage – identical multilingual, dialect boost, and subtitle support as Turbo.
🧾 Subtitles on tap – enable subtitle_enable for sentence-timestamped .titles files (great for captions or QA).
💼 Predictable billing – $0.10 per 1,000 input tokens (token_input_count), zero cost for outputs.

Upgrading from Speech 2.0 HD?
Expect noticeably richer performances. The API schema is unchanged, but the per-character price is 4× higher. Consider offering both HD generations so customers can pick the fidelity that matches their budget.

Quick start

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  https://api.replicate.com/v1/predictions \
  -d '{
    "version": "latest",
    "input": {
      "text": "Welcome to the MiniMax Speech 2.6 HD voice studio.",
      "voice_id": "English_expressive_narrator",
      "emotion": "calm",
      "audio_format": "flac",
      "subtitle_enable": true
    }
  }'

Outputs include a hosted audio file (e.g., FLAC) plus subtitle metadata when requested.

Input parameters

Name	Type	Default	Description
`text`	string	–	Up to 10 000 characters. Supports `<#seconds#>` pause markers and multi-paragraph scripts.
`voice_id`	string	`Wise_Woman`	Any MiniMax system or cloned voice ID.
`speed`	float	`1.0`	Range 0.5–2.0.
`volume`	float	`1.0`	Range 0–10.
`pitch`	int	`0`	Semitone shift −12 to +12.
`emotion`	string	`auto`	`auto`, `happy`, `sad`, `angry`, `fearful`, `disgusted`, `surprised`, `calm`, `fluent`, `neutral`.
`english_normalization`	bool	`false`	Enables advanced number/date handling for English text.
`sample_rate`	int	`32000`	8000–44100 Hz.
`bitrate`	int	`128000`	32000, 64000, 128000, or 256000 (MP3 only).
`audio_format`	string	`mp3`	Choose `mp3`, `wav`, `flac`, or `pcm`. FLAC/WAV recommended for post-production.
`channel`	string	`mono`	`mono` or `stereo`.
`subtitle_enable`	bool	`false`	Return MiniMax subtitle metadata (sentence-level timestamps).
`language_boost`	string	`Null`	Boost recognition for any supported language or set `Automatic`.

Output

You receive: - A hosted audio file in the requested format (valid for 24 hours by default). - Metadata containing character counts, duration, bitrate, etc. - Optional .titles subtitle JSON when subtitle_enable is true.

Pricing on Replicate

$0.10 per 1,000 input tokens (token_input_count)
$0.00 per output token

Because the metric comes straight from MiniMax’s character counter, you can estimate costs by multiplying character count × \$0.0001.

Ideal use cases

Narrated product demos, audiobooks, podcasts, and marketing assets
Localization pipelines needing multiple languages with consistent delivery
Dialogue tracks for games or animated content
Accessibility overlays (read-aloud, captioned videos, screenreader augmentations)

Additional resources

MiniMax Speech T2A API reference: https://platform.minimax.io/docs/api-reference/speech-t2a-intro
MiniMax voice list: https://platform.minimax.io/docs/faq/system-voice-id
MiniMax privacy policy: https://intl.minimaxi.com/protocol/privacy-policy
MiniMax terms of service: https://intl.minimaxi.com/protocol/terms-of-service

For interactive R&D or low-latency deployments, use the Turbo sibling model. For premier-quality voiceovers that stand up to post-production, Speech 2.6 HD is the better fit.

Model created 4 months ago

Model updated 3 months, 3 weeks ago