You're looking at a specific version of this model. Jump to the model overview.
usamaehsan /voices:49b093f2
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
mode |
string
(enum)
|
zero_shot
Options: zero_shot, cross_lingual, voice_conversion |
Voice synthesis mode
|
text |
string
|
Text to be synthesized (for zero_shot and cross_lingual modes)
|
|
prompt_text |
string
|
Prompt text corresponding to the prompt audio (for zero_shot mode only)
|
|
prompt_audio |
string
|
Prompt audio file (for zero_shot and cross_lingual modes)
|
|
source_audio |
string
|
Source audio file for voice conversion
|
|
target_audio |
string
|
Target audio file for voice conversion
|
|
speed |
number
|
1
Min: 0.2 |
Speech speed factor
|
max_chunk_time |
integer
|
30
|
Maximum time in seconds for processing each chunk
|
use_cpu |
boolean
|
False
|
Force CPU usage instead of GPU
|
use_half_precision |
boolean
|
True
|
Enable FP16 precision for faster processing
|
optimize_memory |
boolean
|
True
|
Enable memory optimizations
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}