cjwbw/voicecraft:6e42571a – Run with an API on Replicate

Version

You're looking at a specific version of this model. Jump to the model overview.

cjwbw /voicecraft:6e42571a

Playground API

Input

task

string

Choose a task

Default: "zero-shot text-to-speech"

voicecraft_model

string

Choose a model

Default: "giga330M_TTSEnhanced.pth"

orig_audio

*file

Original audio file

orig_transcript

string

Shift + Return to add a new line

Optionally provide the transcript of the input audio. Leave it blank to use the WhisperX model below to generate the transcript. Inaccurate transcription may lead to error TTS or speech editing

Default: ""

whisperx_model

string

If orig_transcript is not provided above, choose WhisperX model. Inaccurate transcription may lead to error TTS or speech editing. You can modify the generated transcript and provide it directly to

Default: "base.en"

target_transcript

*string

Shift + Return to add a new line

Transcript of the target audio file

cut_off_sec

number

Only used for for zero-shot text-to-speech task. The first seconds of the original audio that are used for zero-shot text-to-speech. 3 sec of reference is generally enough for high quality voice cloning, but longer is generally better, try e.g. 3~6 sec

Default: 3.01

kvcache

integer

Set to 0 to use less VRAM, but with slower inference

Default: 1

left_margin

number

Margin to the left of the editing segment

Default: 0.08

right_margin

number

Margin to the right of the editing segment

Default: 0.08

temperature

number

Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic. Do not recommend to change

Default: 1

top_p

number

Default value for TTS is 0.9, and 0.8 for speech editing

Default: 0.9

stop_repetition

integer

Default value for TTS is 3, and -1 for speech editing. -1 means do not adjust prob of silence tokens. if there are long silence or unnaturally stretched words, increase sample_batch_size to 2, 3 or even 4

Default: 3

sample_batch_size

integer

Default value for TTS is 4, and 1 for speech editing. The higher the number, the faster the output will be. Under the hood, the model will generate this many samples and choose the shortest one

Default: 4

seed

integer

Random seed. Leave blank to randomize the seed

Run this model in Node.js with one line of code:

npx create-replicate --model=cjwbw/voicecraft

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run cjwbw/voicecraft using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "cjwbw/voicecraft:6e42571a17e0fbbb0d92baa8d73c2926329cf8c3be8eedcee79822f7187b3080",
  {
    input: {
      task: "zero-shot text-to-speech",
      top_p: 0.9,
      kvcache: 1,
      cut_off_sec: 3.01,
      left_margin: 0.08,
      temperature: 1,
      right_margin: 0.08,
      whisperx_model: "base.en",
      orig_transcript: "",
      stop_repetition: 3,
      voicecraft_model: "giga330M_TTSEnhanced.pth",
      sample_batch_size: 4
    }
  }
);

console.log(output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run cjwbw/voicecraft using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "cjwbw/voicecraft:6e42571a17e0fbbb0d92baa8d73c2926329cf8c3be8eedcee79822f7187b3080",
    input={
        "task": "zero-shot text-to-speech",
        "top_p": 0.9,
        "kvcache": 1,
        "cut_off_sec": 3.01,
        "left_margin": 0.08,
        "temperature": 1,
        "right_margin": 0.08,
        "whisperx_model": "base.en",
        "orig_transcript": "",
        "stop_repetition": 3,
        "voicecraft_model": "giga330M_TTSEnhanced.pth",
        "sample_batch_size": 4
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run cjwbw/voicecraft using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "cjwbw/voicecraft:6e42571a17e0fbbb0d92baa8d73c2926329cf8c3be8eedcee79822f7187b3080",
    "input": {
      "task": "zero-shot text-to-speech",
      "top_p": 0.9,
      "kvcache": 1,
      "cut_off_sec": 3.01,
      "left_margin": 0.08,
      "temperature": 1,
      "right_margin": 0.08,
      "whisperx_model": "base.en",
      "orig_transcript": "",
      "stop_repetition": 3,
      "voicecraft_model": "giga330M_TTSEnhanced.pth",
      "sample_batch_size": 4
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

No output yet! Press "Submit" to start a prediction.