lucataco/indextts-2:b219b0f2 | Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
text	string		Text to synthesize.
speaker_audio	string		Reference audio for the target speaker (16k-48kHz WAV).
emotion_audio	string		Optional emotion reference audio. Defaults to speaker audio when omitted.
emotion_scale	number	1 Max: 1	Blend ratio for the emotion reference when both speaker and emotion prompts are used.
emotion_vector	string		Optional comma separated or JSON list of 8 emotion weights to bypass the classifier.
emotion_text	string		Text prompt used to auto-detect emotions via Qwen when provided.
randomize_emotion	boolean	False	Pick emotion embeddings randomly instead of nearest-neighbour selection when vectors are provided.
interval_silence_ms	integer	200 Max: 2000	Silence inserted between long segments in milliseconds.
max_text_tokens_per_segment	integer	120 Min: 32 Max: 300	Maximum BPE tokens per autoregressive segment.
top_p	number	0.8 Max: 1	Top-p nucleus sampling for GPT stage.
top_k	integer	30 Min: 1 Max: 200	Top-k sampling for GPT stage.
temperature	number	0.8 Max: 2	Sampling temperature for GPT stage.
length_penalty	number	0 Max: 5	Beam search length penalty.
num_beams	integer	3 Min: 1 Max: 8	Beam width for GPT stage.
repetition_penalty	number	10 Min: 1 Max: 30	Penalty for repeated tokens.
max_mel_tokens	integer	1500 Min: 256 Max: 4096	Maximum mel tokens to generate per segment.

The shape of the response you’ll get when you run this model with an API.

Schema

{'format': 'uri', 'title': 'Output', 'type': 'string'}