You're looking at a specific version of this model. Jump to the model overview.

cjwbw /voicecraft:0c2ef73b

Input

string

Choose a task. For zero-shot text-to-speech, you also need to specify the cut_off_sec of the original audio to be used for zero-shot generation and the transcript until the cut_off_sec

Default: "speech_editing-substitution"

*file

Original audio file

*string
Shift + Return to add a new line

Transcript of the original audio file. You can use models such as https://replicate.com/openai/whisper and https://replicate.com/vaibhavs10/incredibly-fast-whisper to get the transcript (and modify it if it's not accurate)

*string
Shift + Return to add a new line

Transcript of the target audio file

number

Valid/Required for zero-shot text-to-speech task. The first seconds of the original audio that are used for zero-shot text-to-speech (TTS). 3 sec of reference is generally enough for high quality voice cloning, but longer is generally better, try e.g. 3~6 sec

Default: 3.01

string
Shift + Return to add a new line

Valid/Required for zero-shot text-to-speech task. Transcript of the original audio file until the cut_off_sec specified above. This process will be improved and made automatically later

number
(minimum: 0.01, maximum: 5)

Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic

Default: 1

number
(minimum: 0, maximum: 1)

When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens

Default: 0.8

integer

-1 means do not adjust prob of silence tokens. if there are long silence or unnaturally strecthed words, increase sample_batch_size to 2, 3 or even 4

Default: -1

integer

Specify the sampling rate of the audio codec

Default: 16000

integer

Random seed. Leave blank to randomize the seed

Output

No output yet! Press "Submit" to start a prediction.