You're looking at a specific version of this model. Jump to the model overview.

tmappdev /cosy_voice_cloner:5c6a1398

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
reference_audio
string
Path to reference audio (3-10s)
text
string
Text to synthesize
language
string (enum)
English

Options:

Chinese, English, Japanese, Korean, Cantonese, Mixed

Language mode
split_method
string (enum)
By Sentences (4 each)

Options:

None, By Sentences (4 each), By Length (~50 chars), By Chinese Full Stop (。), By English Full Stop (.), By Any Punctuation

Text splitting method
speed
number
1
Speech speed (1.0 is normal speed)
top_k
integer
20
Top-K sampling
top_p
number
0.6
Top-P sampling
temperature
number
0.6
Sampling temperature

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}