You're looking at a specific version of this model. Jump to the model overview.

tmappdev /cosy_voice_cloner:51a8d8dd

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
ref_audio
string
Reference audio file (3-10 seconds)
prompt_text
string
Text of the reference audio (optional)
prompt_language
string (enum)
粤语

Options:

中文, 英文, 日文, 粤语, 韩文, 中英混合, 日英混合, 粤英混合, 韩英混合, 多语种混合, 多语种混合(粤语)

Language of reference audio
text
string
Text to synthesize
text_language
string (enum)
粤语

Options:

中文, 英文, 日文, 粤语, 韩文, 中英混合, 日英混合, 粤英混合, 韩英混合, 多语种混合, 多语种混合(粤语)

Language of the text to synthesize
how_to_cut
string (enum)
按标点符号切

Options:

不切, 凑四句一切, 凑50字一切, 按中文句号。切, 按英文句号.切, 按标点符号切

How to split text
top_k
integer
15

Min: 1

Max: 100

GPT top_k parameter
top_p
number
1

Max: 1

GPT top_p parameter
temperature
number
1

Max: 1

GPT temperature parameter
ref_free
boolean
False
Enable reference-free mode
speed
number
1

Min: 0.6

Max: 1.65

Speech speed adjustment
reference_files
array
Optional additional reference files to blend

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}