You're looking at a specific version of this model. Jump to the model overview.
cjwbw /voicecraft:6d8f23ab
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
task |
string
(enum)
|
speech_editing-substitution
Options: speech_editing-substitution, speech_editing-insertion, speech_editing-sdeletion, zero-shot text-to-speech |
Choose a task. For zero-shot text-to-speech, you also need to specify the cut_off_sec of the original audio to be used for zero-shot generation and the transcript until the cut_off_sec
|
orig_audio |
string
|
Original audio file
|
|
orig_transcript |
string
|
Transcript of the original audio file. You can use models such as https://replicate.com/openai/whisper and https://replicate.com/vaibhavs10/incredibly-fast-whisper to get the transcript (and modify it if it's not accurate)
|
|
target_transcript |
string
|
Transcript of the target audio file
|
|
cut_off_sec |
number
|
Valid/Required for zero-shot text-to-speech task. The first seconds of the original audio that are used for zero-shot text-to-speech (TTS). 3 sec of reference is generally enough for high quality voice cloning, but longer is generally better, try e.g. 3~6 sec
|
|
orig_transcript_until_cutoff_time |
string
|
Valid/Required for zero-shot text-to-speech task. Transcript of the original audio file until the cut_off_sec specified above. This process will be improved and made automatically later
|
|
temperature |
number
|
1
Min: 0.01 Max: 5 |
Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic
|
top_p |
number
|
0.8
Max: 1 |
When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens
|
stop_repetition |
integer
|
-1
|
-1 means do not adjust prob of silence tokens. if there are long silence or unnaturally strecthed words, increase sample_batch_size to 2, 3 or even 4
|
sampling_rate |
integer
|
16000
|
Specify the sampling rate of the audio codec
|
seed |
integer
|
Random seed. Leave blank to randomize the seed
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}