You're looking at a specific version of this model. Jump to the model overview.
minimax /speech-02-turbo:e2e8812b
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
| Field | Type | Default value | Description |
|---|---|---|---|
| emotion |
None
|
auto
|
Desired delivery style. Use auto to let MiniMax choose, or pick a specific emotion.
|
| sample_rate |
None
|
32000
|
Audio sample rate in Hz.
|
| bitrate |
None
|
128000
|
MP3 bitrate in bits per second. Only used when audio_format is mp3.
|
| audio_format |
None
|
mp3
|
File format for the generated audio. Choose mp3 for general use, wav/flac for lossless, or pcm for raw bytes.
|
| channel |
None
|
mono
|
mono for 1 channel (default), stereo for 2 channels.
|
| language_boost |
None
|
None
|
Optional language hint. Choose Automatic to let MiniMax detect the language, or pick a specific locale.
|
| text |
string
|
Text to narrate (max 10,000 characters). Use markers like <#0.5#> to insert pauses in seconds.
|
|
| pitch |
integer
|
0
Min: -12 Max: 12 |
Semitone offset applied to the voice (−12 to +12).
|
| speed |
number
|
1
Min: 0.5 Max: 2 |
Speech speed multiplier (0.5–2.0). Lower is slower, higher is faster.
|
| volume |
number
|
1
Max: 10 |
Relative loudness. 1.0 is default MiniMax gain. Range 0–10.
|
| voice_id |
string
|
Wise_Woman
|
Voice to synthesize. Pick any MiniMax system voice or a voice_id returned by https://replicate.com/minimax/voice-cloning.
|
| subtitle_enable |
boolean
|
False
|
Return MiniMax subtitle metadata with sentence timestamps (non-streaming only).
|
| english_normalization |
boolean
|
False
|
Improve number/date reading for English text (adds a small amount of latency).
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}