MiniMax Speech 2.6 HD on Replicate
Models
- Speech-2.6-HD: Next-generation high-definition model with improved realism and expressive control
- Speech-2.6-Turbo: Enhanced low-latency model optimized for live and interactive applications
- Speech-02-HD: Optimized for high-fidelity applications like voiceovers and audiobooks
- Speech-02-Turbo: Designed for real-time applications with low latency
- Voice-Cloning: Clone voices for use with speech-02-hd and speech-02-turbo
MiniMax Speech 2.6 HD is the flagship text-to-audio model from MiniMax, tuned for premium voiceover work, audiobooks, marketing content, and any scenario that demands maximum fidelity and vocal nuance. It ships on Replicate with the same easy REST API as the Turbo model, plus full support for 40+ languages, 300+ voices, and custom voice cloning.
Why use the HD variant?
- π Studio-grade prosody β crisper articulation, better breath control, and smoother phrasing than 2.6 Turbo.
- π§ Emotion intelligence β βautoβ matches the tone to your script, or pick precise emotions like
calm,fluent, orsurprised. - π Global language coverage β identical multilingual, dialect boost, and subtitle support as Turbo.
- π§Ύ Subtitles on tap β enable
subtitle_enablefor sentence-timestamped.titlesfiles (great for captions or QA). - πΌ Predictable billing β $0.10 per 1,000 input tokens (
token_input_count), zero cost for outputs.
Upgrading from Speech 2.0 HD?
Expect noticeably richer performances. The API schema is unchanged, but the per-character price is 4Γ higher. Consider offering both HD generations so customers can pick the fidelity that matches their budget.
Quick start
curl -s -X POST \
-H "Authorization: Bearer $REPLICATE_API_TOKEN" \
-H "Content-Type: application/json" \
https://api.replicate.com/v1/predictions \
-d '{
"version": "latest",
"input": {
"text": "Welcome to the MiniMax Speech 2.6 HD voice studio.",
"voice_id": "English_expressive_narrator",
"emotion": "calm",
"audio_format": "flac",
"subtitle_enable": true
}
}'
Outputs include a hosted audio file (e.g., FLAC) plus subtitle metadata when requested.
Input parameters
| Name | Type | Default | Description |
|---|---|---|---|
text |
string | β | Up to 10β―000 characters. Supports <#seconds#> pause markers and multi-paragraph scripts. |
voice_id |
string | Wise_Woman |
Any MiniMax system or cloned voice ID. |
speed |
float | 1.0 |
Range 0.5β2.0. |
volume |
float | 1.0 |
Range 0β10. |
pitch |
int | 0 |
Semitone shift β12 to +12. |
emotion |
string | auto |
auto, happy, sad, angry, fearful, disgusted, surprised, calm, fluent, neutral. |
english_normalization |
bool | false |
Enables advanced number/date handling for English text. |
sample_rate |
int | 32000 |
8000β44100 Hz. |
bitrate |
int | 128000 |
32000, 64000, 128000, or 256000 (MP3 only). |
audio_format |
string | mp3 |
Choose mp3, wav, flac, or pcm. FLAC/WAV recommended for post-production. |
channel |
string | mono |
mono or stereo. |
subtitle_enable |
bool | false |
Return MiniMax subtitle metadata (sentence-level timestamps). |
language_boost |
string | Null |
Boost recognition for any supported language or set Automatic. |
Output
You receive:
- A hosted audio file in the requested format (valid for 24 hours by default).
- Metadata containing character counts, duration, bitrate, etc.
- Optional .titles subtitle JSON when subtitle_enable is true.
Pricing on Replicate
- $0.10 per 1,000 input tokens (
token_input_count) - $0.00 per output token
Because the metric comes straight from MiniMaxβs character counter, you can estimate costs by multiplying character count Γ \$0.0001.
Ideal use cases
- Narrated product demos, audiobooks, podcasts, and marketing assets
- Localization pipelines needing multiple languages with consistent delivery
- Dialogue tracks for games or animated content
- Accessibility overlays (read-aloud, captioned videos, screenreader augmentations)
Additional resources
- MiniMax Speech T2A API reference: https://platform.minimax.io/docs/api-reference/speech-t2a-intro
- MiniMax voice list: https://platform.minimax.io/docs/faq/system-voice-id
- MiniMax privacy policy: https://intl.minimaxi.com/protocol/privacy-policy
- MiniMax terms of service: https://intl.minimaxi.com/protocol/terms-of-service
For interactive R&D or low-latency deployments, use the Turbo sibling model. For premier-quality voiceovers that stand up to post-production, Speech 2.6 HD is the better fit.