geopti/chatterbox-multilingual:a0459af8 | Run with an API on Replicate

You're looking at a specific version of this model. Jump to the model overview.

geopti /chatterbox-multilingual:a0459af8

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field	Type	Default value	Description
text	string		The text you want spoken. Can be a single sentence or a long paragraph — long inputs are automatically split into chunks.
language	None	en	Language of the text. Use the two-letter code (en=English, fr=French, de=German, es=Spanish, ja=Japanese, zh=Chinese, ar=Arabic, el=Greek, etc.).
audio_prompt	string		Optional reference voice clip (.wav/.mp3). The output will mimic this voice. If left empty, a default voice is used.
cfg_weight	number	0.5 Max: 1	How closely the speech follows the text. Higher = sticks to the text more strictly. Lower = more freedom (but can hallucinate or get stuck).
exaggeration	number	0.5 Max: 1	How expressive the voice is. Higher = more emotional / dramatic. Lower = more flat / neutral.
temperature	number	0.8 Max: 2	Randomness of the voice. Higher = more variation between runs. Lower = more consistent / robotic.
repetition_penalty	number	2 Min: 1 Max: 5	Penalty for repeating the same sounds. Higher = less repetition.
top_p	number	1 Max: 1	Top-p (nucleus) sampling. Restricts the model to the most likely tokens. 1.0 = no restriction.
pause_between_sentences	number	0.1 Max: 5	Length of the silence (in seconds) inserted between sentences.
max_words_per_chunk	integer	60 Min: 10 Max: 200	Long texts are split into chunks before generation. This is the max number of words per chunk. Smaller = safer for tricky languages, but slower.
repeated_token_threshold	integer	3 Min: 2 Max: 10	If the model repeats the same sound this many times in a row, the chunk is cut off (prevents the model from getting stuck looping). Raise this if too much real speech is being cut.
garbage_trim_buffer	integer	25 Max: 200	Number of audio frames kept after the model finishes saying the sentence (each frame = ~40ms). Lower = trims garbage tails more aggressively but may cut off the last syllable.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{'format': 'uri', 'title': 'Output', 'type': 'string'}