You're looking at a specific version of this model. Jump to the model overview.

usamaehsan /voices:6d23db56

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
mode
string (enum)
zero_shot

Options:

zero_shot, cross_lingual, voice_conversion

Voice synthesis mode
text
string
Text to be synthesized (for zero_shot and cross_lingual modes)
prompt_text
string
Prompt text corresponding to the prompt audio (for zero_shot mode only)
prompt_audio
string
Prompt audio file (for zero_shot and cross_lingual modes)
source_audio
string
Source audio file for voice conversion
target_audio
string
Target audio file for voice conversion
speed
number
1

Min: 0.2

Speech speed factor
max_chunk_time
integer
30
Maximum time in seconds for processing each chunk
use_cpu
boolean
False
Force CPU usage instead of GPU
use_half_precision
boolean
True
Enable FP16 precision for faster processing
optimize_memory
boolean
True
Enable memory optimizations

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}