You're looking at a specific version of this model. Jump to the model overview.
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
| Field | Type | Default value | Description |
|---|---|---|---|
| text |
string
|
Text to synthesize into speech
|
|
| mode |
None
|
custom_voice
|
TTS mode: 'custom_voice' uses preset speakers, 'voice_clone' clones from reference audio, 'voice_design' creates voice from description
|
| language |
None
|
auto
|
Language of the text (use 'auto' for automatic detection)
|
| speaker |
None
|
Serena
|
Preset speaker voice (only for 'custom_voice' mode)
|
| voice_description |
string
|
Natural language description of desired voice (only for 'voice_design' mode). Example: 'A warm, friendly female voice with a slight British accent'
|
|
| reference_audio |
string
|
Reference audio file for voice cloning (only for 'voice_clone' mode)
|
|
| reference_text |
string
|
Transcript of the reference audio (recommended for 'voice_clone' mode)
|
|
| style_instruction |
string
|
Optional style/emotion instruction (e.g., 'speak slowly and calmly', 'excited tone')
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}