aodianyun/indextts2-thai:2bcf5366 | Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

aodianyun /indextts2-thai:2bcf5366

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
prompt_audio	string		说话人参考音频（泰语，wav/mp3）
text	string		待合成的文本（建议泰文为主，可少量混英文）
temperature	number	0.5	采样温度，越低越稳定、越少离谱音色，越高越有表现力（推荐 0.3~0.8）
top_p	number	0.7	nucleus sampling 截断概率，越低越“保守”，越高越“开放”（推荐 0.5~0.9）
top_k	integer	20	每步候选 token 数，越小越稳，越大越有创意但更易出噪音（推荐 10~30）
do_sample	boolean	True	是否开启采样；False 时使用 beam search，一般更稳定、噪音更少
num_beams	integer	1	beam search 的 beam 数，>1 且 do_sample=False 时生效，越大越慢但更稳（推荐 3~5）
length_penalty	number	0	beam search 长度惩罚系数，0 表示不过度偏好长句，一般保持 0 即可
repetition_penalty	number	1.2	重复惩罚系数，略微大于 1 可减少奇怪重复/口吃感（推荐 1.1~1.3）
max_mel_tokens	integer	1500	最大 mel token 数，上限越大越不容易被截断但会变慢（推荐 1500~2200）
max_text_tokens_per_segment	integer	120	单段最大文本长度，适当减小可提升长句稳定性（推荐 80~120）
interval_silence	integer	200	分段之间插入的静音时长（毫秒），控制句子停顿感（推荐 200~400）
emo_alpha	number	0.8	情感强度 [0,1]，越小越接近原声线且更稳，越大情感更夸张（推荐 0.5~0.8）
use_emo_text	boolean	False	是否根据文本/emo_text 自动推断情感向量，不依赖情感参考音频
emo_text	string		独立的情感提示文本；留空时默认使用合成文本本身
use_random	boolean	False	情感向量采样是否加入随机性，一般建议关闭以保证可复现和稳定性
verbose	boolean	False	是否打印详细调试信息，仅排查问题时建议开启

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{'format': 'uri', 'title': 'Output', 'type': 'string'}