You're looking at a specific version of this model. Jump to the model overview.
lucataco /indextts-2:b219b0f2
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
text |
string
|
Text to synthesize.
|
|
speaker_audio |
string
|
Reference audio for the target speaker (16k-48kHz WAV).
|
|
emotion_audio |
string
|
Optional emotion reference audio. Defaults to speaker audio when omitted.
|
|
emotion_scale |
number
|
1
Max: 1 |
Blend ratio for the emotion reference when both speaker and emotion prompts are used.
|
emotion_vector |
string
|
Optional comma separated or JSON list of 8 emotion weights to bypass the classifier.
|
|
emotion_text |
string
|
Text prompt used to auto-detect emotions via Qwen when provided.
|
|
randomize_emotion |
boolean
|
False
|
Pick emotion embeddings randomly instead of nearest-neighbour selection when vectors are provided.
|
interval_silence_ms |
integer
|
200
Max: 2000 |
Silence inserted between long segments in milliseconds.
|
max_text_tokens_per_segment |
integer
|
120
Min: 32 Max: 300 |
Maximum BPE tokens per autoregressive segment.
|
top_p |
number
|
0.8
Max: 1 |
Top-p nucleus sampling for GPT stage.
|
top_k |
integer
|
30
Min: 1 Max: 200 |
Top-k sampling for GPT stage.
|
temperature |
number
|
0.8
Max: 2 |
Sampling temperature for GPT stage.
|
length_penalty |
number
|
0
Max: 5 |
Beam search length penalty.
|
num_beams |
integer
|
3
Min: 1 Max: 8 |
Beam width for GPT stage.
|
repetition_penalty |
number
|
10
Min: 1 Max: 30 |
Penalty for repeated tokens.
|
max_mel_tokens |
integer
|
1500
Min: 256 Max: 4096 |
Maximum mel tokens to generate per segment.
|
Output schema
The shape of the response you’ll get when you run this model with an API.
{'format': 'uri', 'title': 'Output', 'type': 'string'}