usamaehsan / voices

(Updated 7 months, 2 weeks ago)

  • Public
  • 755 runs
  • L40S
Iterate in playground

Input

string

Voice synthesis mode

Default: "zero_shot"

string
Shift + Return to add a new line

Text to be synthesized (for zero_shot and cross_lingual modes)

string
Shift + Return to add a new line

Prompt text corresponding to the prompt audio (for zero_shot mode only)

file
Preview
Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x

Prompt audio file (for zero_shot and cross_lingual modes)

file

Source audio file for voice conversion

file

Target audio file for voice conversion

number
(minimum: 0.2)

Speech speed factor

Default: 1

integer

Maximum time in seconds for processing each chunk

Default: 30

boolean

Force CPU usage instead of GPU

Default: false

boolean

Enable FP16 precision for faster processing

Default: true

boolean

Enable memory optimizations

Default: true

Output

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
Generated in

Run time and cost

This model costs approximately $0.040 to run on Replicate, or 25 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 42 seconds. The predict time for this model varies significantly based on the inputs.

Readme

This model doesn't have a readme.