You're looking at a specific version of this model. Jump to the model overview.

lucataco /indextts-2:b219b0f2

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
text
string
Text to synthesize.
speaker_audio
string
Reference audio for the target speaker (16k-48kHz WAV).
emotion_audio
string
Optional emotion reference audio. Defaults to speaker audio when omitted.
emotion_scale
number
1

Max: 1

Blend ratio for the emotion reference when both speaker and emotion prompts are used.
emotion_vector
string
Optional comma separated or JSON list of 8 emotion weights to bypass the classifier.
emotion_text
string
Text prompt used to auto-detect emotions via Qwen when provided.
randomize_emotion
boolean
False
Pick emotion embeddings randomly instead of nearest-neighbour selection when vectors are provided.
interval_silence_ms
integer
200

Max: 2000

Silence inserted between long segments in milliseconds.
max_text_tokens_per_segment
integer
120

Min: 32

Max: 300

Maximum BPE tokens per autoregressive segment.
top_p
number
0.8

Max: 1

Top-p nucleus sampling for GPT stage.
top_k
integer
30

Min: 1

Max: 200

Top-k sampling for GPT stage.
temperature
number
0.8

Max: 2

Sampling temperature for GPT stage.
length_penalty
number
0

Max: 5

Beam search length penalty.
num_beams
integer
3

Min: 1

Max: 8

Beam width for GPT stage.
repetition_penalty
number
10

Min: 1

Max: 30

Penalty for repeated tokens.
max_mel_tokens
integer
1500

Min: 256

Max: 4096

Maximum mel tokens to generate per segment.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}