zsxkib / create-rvc-dataset

Create your own Realistic Voice Cloning (RVC v2) dataset using a YouTube link

Cold

Public
11.1K runs
L40S
GitHub
Paper

Iterate in playground

Run with an API

Playground API Examples README Versions

Input

Run this model in Node.js with one line of code:

npx create-replicate --model=zsxkib/create-rvc-dataset

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run zsxkib/create-rvc-dataset using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "zsxkib/create-rvc-dataset:c445e27ff34574e92781c15c67db41835cedcdc27a19f527a7dcf37bd0ffe1ff",
  {
    input: {
      audio_name: "andrew_huberman",
      youtube_url: "https://www.youtube.com/watch?v=4b6bwcWK6GE"
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run zsxkib/create-rvc-dataset using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "zsxkib/create-rvc-dataset:c445e27ff34574e92781c15c67db41835cedcdc27a19f527a7dcf37bd0ffe1ff",
    input={
        "audio_name": "andrew_huberman",
        "youtube_url": "https://www.youtube.com/watch?v=4b6bwcWK6GE"
    }
)

# To access the file URL:
print(output.url())
#=> "http://example.com"

# To write the file to disk:
with open("my-image.png", "wb") as file:
    file.write(output.read())

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run zsxkib/create-rvc-dataset using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "zsxkib/create-rvc-dataset:c445e27ff34574e92781c15c67db41835cedcdc27a19f527a7dcf37bd0ffe1ff",
    "input": {
      "audio_name": "andrew_huberman",
      "youtube_url": "https://www.youtube.com/watch?v=4b6bwcWK6GE"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

You can run this model locally using Cog. First, install Cog:

brew install cog

If you don’t have Homebrew, there are other installation options available.

Run this to download the model and run it in your local environment:

cog predict r8.im/zsxkib/create-rvc-dataset@sha256:c445e27ff34574e92781c15c67db41835cedcdc27a19f527a7dcf37bd0ffe1ff \
  -i 'audio_name="andrew_huberman"' \
  -i 'youtube_url="https://www.youtube.com/watch?v=4b6bwcWK6GE"'

To learn more, take a look at the Cog documentation.

Run this to download the model and run it in your local environment:

docker run -d -p 5000:5000 --gpus=all r8.im/zsxkib/create-rvc-dataset@sha256:c445e27ff34574e92781c15c67db41835cedcdc27a19f527a7dcf37bd0ffe1ff
curl -s -X POST \
  -H "Content-Type: application/json" \
  -d $'{
    "input": {
      "audio_name": "andrew_huberman",
      "youtube_url": "https://www.youtube.com/watch?v=4b6bwcWK6GE"
    }
  }' \
  http://localhost:5000/predictions

To learn more, take a look at the Cog documentation.

Output

dataset_andrew_huberman.zip

{
  "completed_at": "2023-11-20T16:24:10.888049Z",
  "created_at": "2023-11-20T16:18:50.389924Z",
  "data_removed": false,
  "error": null,
  "id": "34265zbb6kf7shmu3vuskz44oi",
  "input": {
    "audio_name": "andrew_huberman",
    "youtube_url": "https://www.youtube.com/watch?v=4b6bwcWK6GE"
  },
  "logs": "[youtube] Extracting URL: https://www.youtube.com/watch?v=4b6bwcWK6GE\n[youtube] 4b6bwcWK6GE: Downloading webpage\n[youtube] 4b6bwcWK6GE: Downloading ios player API JSON\n[youtube] 4b6bwcWK6GE: Downloading android player API JSON\n[youtube] 4b6bwcWK6GE: Downloading m3u8 information\n[info] 4b6bwcWK6GE: Downloading 1 format(s): 251\n[download] Destination: youtubeaudio/andrew_huberman\n[download]   0.0% of    3.74MiB at  Unknown B/s ETA Unknown\n[download]   0.1% of    3.74MiB at    1.82MiB/s ETA 00:02\n[download]   0.2% of    3.74MiB at    2.77MiB/s ETA 00:01\n[download]   0.4% of    3.74MiB at    4.12MiB/s ETA 00:00\n[download]   0.8% of    3.74MiB at    3.27MiB/s ETA 00:01\n[download]   1.6% of    3.74MiB at    3.82MiB/s ETA 00:00\n[download]   3.3% of    3.74MiB at    4.15MiB/s ETA 00:00\n[download]   6.7% of    3.74MiB at    6.05MiB/s ETA 00:00\n[download]  13.3% of    3.74MiB at    9.41MiB/s ETA 00:00\n[download]  26.7% of    3.74MiB at   13.75MiB/s ETA 00:00\n[download]  53.4% of    3.74MiB at   25.34MiB/s ETA 00:00\n[download] 100.0% of    3.74MiB at   38.68MiB/s ETA 00:00\n[download] 100% of    3.74MiB in 00:00:00 at 17.54MiB/s\n[ExtractAudio] Destination: youtubeaudio/andrew_huberman.wav\nDeleting original file youtubeaudio/andrew_huberman (pass -k to keep)\nDownloading: \"https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th\" to /root/.cache/torch/hub/checkpoints/955717e8-8726e21a.th\n  0%|          | 0.00/80.2M [00:00<?, ?B/s]\n  8%|▊         | 6.13M/80.2M [00:00<00:01, 64.3MB/s]\n 22%|██▏       | 17.8M/80.2M [00:00<00:00, 98.4MB/s]\n 40%|████      | 32.4M/80.2M [00:00<00:00, 123MB/s] \n 60%|██████    | 48.2M/80.2M [00:00<00:00, 140MB/s]\n 77%|███████▋  | 61.6M/80.2M [00:00<00:00, 140MB/s]\n 95%|█████████▌| 76.6M/80.2M [00:00<00:00, 146MB/s]\n100%|██████████| 80.2M/80.2M [00:00<00:00, 135MB/s]\n  0%|                                                                                 | 0.0/263.25 [00:00<?, ?seconds/s]\n  2%|█▌                                                                      | 5.85/263.25 [00:07<05:31,  1.29s/seconds]\n  4%|███▏                                                                    | 11.7/263.25 [00:07<02:18,  1.81seconds/s]\n  7%|███▊                                                      | 17.549999999999997/263.25 [00:07<01:17,  3.15seconds/s]\n  9%|██████▍                                                                 | 23.4/263.25 [00:08<00:49,  4.83seconds/s]\n 11%|███████▉                                                               | 29.25/263.25 [00:08<00:34,  6.84seconds/s]\n 13%|███████▋                                                  | 35.099999999999994/263.25 [00:08<00:25,  9.12seconds/s]\n 16%|█████████                                                 | 40.949999999999996/263.25 [00:08<00:19, 11.60seconds/s]\n 18%|████████████▊                                                           | 46.8/263.25 [00:09<00:15, 14.07seconds/s]\n 20%|██████████████▏                                                        | 52.65/263.25 [00:09<00:12, 16.34seconds/s]\n 22%|████████████████                                                        | 58.5/263.25 [00:09<00:11, 18.53seconds/s]\n 24%|█████████████████▎                                                     | 64.35/263.25 [00:09<00:09, 20.37seconds/s]\n 27%|███████████████▋                                           | 70.19999999999999/263.25 [00:09<00:08, 21.78seconds/s]\n 29%|████████████████████▌                                                  | 76.05/263.25 [00:10<00:08, 23.01seconds/s]\n 31%|██████████████████▎                                        | 81.89999999999999/263.25 [00:10<00:07, 23.88seconds/s]\n 33%|███████████████████████▋                                               | 87.75/263.25 [00:10<00:07, 24.65seconds/s]\n 36%|█████████████████████████▌                                              | 93.6/263.25 [00:10<00:06, 25.14seconds/s]\n 38%|██████████████████████▎                                    | 99.44999999999999/263.25 [00:11<00:06, 25.48seconds/s]\n 40%|████████████████████████████▍                                          | 105.3/263.25 [00:11<00:06, 25.72seconds/s]\n 42%|████████████████████████▍                                 | 111.14999999999999/263.25 [00:11<00:05, 25.89seconds/s]\n 44%|███████████████████████████████▌                                       | 117.0/263.25 [00:11<00:05, 26.00seconds/s]\n 47%|████████████████████████████████▋                                     | 122.85/263.25 [00:11<00:05, 26.01seconds/s]\n 49%|██████████████████████████████████▋                                    | 128.7/263.25 [00:12<00:05, 26.10seconds/s]\n 51%|█████████████████████████████▋                            | 134.54999999999998/263.25 [00:12<00:04, 25.99seconds/s]\n 53%|██████████████████████████████▉                           | 140.39999999999998/263.25 [00:12<00:04, 26.06seconds/s]\n 56%|██████████████████████████████████████▉                               | 146.25/263.25 [00:12<00:04, 26.18seconds/s]\n 58%|█████████████████████████████████████████                              | 152.1/263.25 [00:13<00:04, 26.14seconds/s]\n 60%|██████████████████████████████████████████                            | 157.95/263.25 [00:13<00:04, 26.20seconds/s]\n 62%|████████████████████████████████████                      | 163.79999999999998/263.25 [00:13<00:03, 26.25seconds/s]\n 64%|█████████████████████████████████████▍                    | 169.64999999999998/263.25 [00:13<00:03, 26.03seconds/s]\n 67%|███████████████████████████████████████████████▎                       | 175.5/263.25 [00:14<00:03, 26.09seconds/s]\n 69%|████████████████████████████████████████████████▏                     | 181.35/263.25 [00:14<00:03, 26.09seconds/s]\n 71%|██████████████████████████████████████████████████▍                    | 187.2/263.25 [00:14<00:02, 26.24seconds/s]\n 73%|██████████████████████████████████████████▌               | 193.04999999999998/263.25 [00:14<00:02, 26.35seconds/s]\n 76%|███████████████████████████████████████████▊              | 198.89999999999998/263.25 [00:14<00:02, 26.32seconds/s]\n 78%|██████████████████████████████████████████████████████▍               | 204.75/263.25 [00:15<00:02, 26.37seconds/s]\n 80%|████████████████████████████████████████████████████████▊              | 210.6/263.25 [00:15<00:01, 26.42seconds/s]\n 82%|█████████████████████████████████████████████████████████▌            | 216.45/263.25 [00:15<00:01, 26.38seconds/s]\n 84%|████████████████████████████████████████████████▉         | 222.29999999999998/263.25 [00:15<00:01, 26.35seconds/s]\n 87%|██████████████████████████████████████████████████▎       | 228.14999999999998/263.25 [00:16<00:01, 26.37seconds/s]\n 89%|███████████████████████████████████████████████████████████████        | 234.0/263.25 [00:16<00:01, 26.20seconds/s]\n 91%|███████████████████████████████████████████████████████████████▊      | 239.85/263.25 [00:16<00:00, 26.18seconds/s]\n 93%|██████████████████████████████████████████████████████████████████▎    | 245.7/263.25 [00:16<00:00, 26.16seconds/s]\n 96%|███████████████████████████████████████████████████████▍  | 251.54999999999998/263.25 [00:16<00:00, 26.18seconds/s]\n 98%|█████████████████████████████████████████████████████████████████████▍ | 257.4/263.25 [00:17<00:00, 26.22seconds/s]\n100%|██████████████████████████████████████████████████████████████████████| 263.25/263.25 [00:17<00:00, 26.22seconds/s]\n100%|██████████████████████████████████████████████████████████████████████| 263.25/263.25 [00:17<00:00, 15.18seconds/s]\n\u001b[1mImportant: the default model was recently changed to `htdemucs`\u001b[0m the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use `-n mdx_extra_q`.\nSelected model is a bag of 1 models. You will see that many progress bars per track.\nSeparated tracks will be stored in /src/separated/htdemucs\nSeparating track youtubeaudio/andrew_huberman.wav",
  "metrics": {
    "predict_time": 40.477898,
    "total_time": 320.498125
  },
  "output": "https://replicate.delivery/pbxt/AhBrdyHw22Z5LxeevsMkyBvcJrXdzDtuufKSku35CUVSRn0jA/dataset_andrew_huberman.zip",
  "started_at": "2023-11-20T16:23:30.410151Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/34265zbb6kf7shmu3vuskz44oi",
    "cancel": "https://api.replicate.com/v1/predictions/34265zbb6kf7shmu3vuskz44oi/cancel"
  },
  "version": "f6593d27ca2570319dc9cbdfda6b81011819e541e9ee1333eda63edc5884445d"
}

Generated in

40.5 seconds

Tweak itReport View full prediction

[youtube] Extracting URL: https://www.youtube.com/watch?v=4b6bwcWK6GE
[youtube] 4b6bwcWK6GE: Downloading webpage
[youtube] 4b6bwcWK6GE: Downloading ios player API JSON
[youtube] 4b6bwcWK6GE: Downloading android player API JSON
[youtube] 4b6bwcWK6GE: Downloading m3u8 information
[info] 4b6bwcWK6GE: Downloading 1 format(s): 251
[download] Destination: youtubeaudio/andrew_huberman
[download]   0.0% of    3.74MiB at  Unknown B/s ETA Unknown
[download]   0.1% of    3.74MiB at    1.82MiB/s ETA 00:02
[download]   0.2% of    3.74MiB at    2.77MiB/s ETA 00:01
[download]   0.4% of    3.74MiB at    4.12MiB/s ETA 00:00
[download]   0.8% of    3.74MiB at    3.27MiB/s ETA 00:01
[download]   1.6% of    3.74MiB at    3.82MiB/s ETA 00:00
[download]   3.3% of    3.74MiB at    4.15MiB/s ETA 00:00
[download]   6.7% of    3.74MiB at    6.05MiB/s ETA 00:00
[download]  13.3% of    3.74MiB at    9.41MiB/s ETA 00:00
[download]  26.7% of    3.74MiB at   13.75MiB/s ETA 00:00
[download]  53.4% of    3.74MiB at   25.34MiB/s ETA 00:00
[download] 100.0% of    3.74MiB at   38.68MiB/s ETA 00:00
[download] 100% of    3.74MiB in 00:00:00 at 17.54MiB/s
[ExtractAudio] Destination: youtubeaudio/andrew_huberman.wav
Deleting original file youtubeaudio/andrew_huberman (pass -k to keep)
Downloading: "https://dl.fbaipublicfiles.com/demucs/hybrid_transformer/955717e8-8726e21a.th" to /root/.cache/torch/hub/checkpoints/955717e8-8726e21a.th
  0%|          | 0.00/80.2M [00:00<?, ?B/s]
  8%|▊         | 6.13M/80.2M [00:00<00:01, 64.3MB/s]
 22%|██▏       | 17.8M/80.2M [00:00<00:00, 98.4MB/s]
 40%|████      | 32.4M/80.2M [00:00<00:00, 123MB/s] 
 60%|██████    | 48.2M/80.2M [00:00<00:00, 140MB/s]
 77%|███████▋  | 61.6M/80.2M [00:00<00:00, 140MB/s]
 95%|█████████▌| 76.6M/80.2M [00:00<00:00, 146MB/s]
100%|██████████| 80.2M/80.2M [00:00<00:00, 135MB/s]
  0%|                                                                                 | 0.0/263.25 [00:00<?, ?seconds/s]
  2%|█▌                                                                      | 5.85/263.25 [00:07<05:31,  1.29s/seconds]
  4%|███▏                                                                    | 11.7/263.25 [00:07<02:18,  1.81seconds/s]
  7%|███▊                                                      | 17.549999999999997/263.25 [00:07<01:17,  3.15seconds/s]
  9%|██████▍                                                                 | 23.4/263.25 [00:08<00:49,  4.83seconds/s]
 11%|███████▉                                                               | 29.25/263.25 [00:08<00:34,  6.84seconds/s]
 13%|███████▋                                                  | 35.099999999999994/263.25 [00:08<00:25,  9.12seconds/s]
 16%|█████████                                                 | 40.949999999999996/263.25 [00:08<00:19, 11.60seconds/s]
 18%|████████████▊                                                           | 46.8/263.25 [00:09<00:15, 14.07seconds/s]
 20%|██████████████▏                                                        | 52.65/263.25 [00:09<00:12, 16.34seconds/s]
 22%|████████████████                                                        | 58.5/263.25 [00:09<00:11, 18.53seconds/s]
 24%|█████████████████▎                                                     | 64.35/263.25 [00:09<00:09, 20.37seconds/s]
 27%|███████████████▋                                           | 70.19999999999999/263.25 [00:09<00:08, 21.78seconds/s]
 29%|████████████████████▌                                                  | 76.05/263.25 [00:10<00:08, 23.01seconds/s]
 31%|██████████████████▎                                        | 81.89999999999999/263.25 [00:10<00:07, 23.88seconds/s]
 33%|███████████████████████▋                                               | 87.75/263.25 [00:10<00:07, 24.65seconds/s]
 36%|█████████████████████████▌                                              | 93.6/263.25 [00:10<00:06, 25.14seconds/s]
 38%|██████████████████████▎                                    | 99.44999999999999/263.25 [00:11<00:06, 25.48seconds/s]
 40%|████████████████████████████▍                                          | 105.3/263.25 [00:11<00:06, 25.72seconds/s]
 42%|████████████████████████▍                                 | 111.14999999999999/263.25 [00:11<00:05, 25.89seconds/s]
 44%|███████████████████████████████▌                                       | 117.0/263.25 [00:11<00:05, 26.00seconds/s]
 47%|████████████████████████████████▋                                     | 122.85/263.25 [00:11<00:05, 26.01seconds/s]
 49%|██████████████████████████████████▋                                    | 128.7/263.25 [00:12<00:05, 26.10seconds/s]
 51%|█████████████████████████████▋                            | 134.54999999999998/263.25 [00:12<00:04, 25.99seconds/s]
 53%|██████████████████████████████▉                           | 140.39999999999998/263.25 [00:12<00:04, 26.06seconds/s]
 56%|██████████████████████████████████████▉                               | 146.25/263.25 [00:12<00:04, 26.18seconds/s]
 58%|█████████████████████████████████████████                              | 152.1/263.25 [00:13<00:04, 26.14seconds/s]
 60%|██████████████████████████████████████████                            | 157.95/263.25 [00:13<00:04, 26.20seconds/s]
 62%|████████████████████████████████████                      | 163.79999999999998/263.25 [00:13<00:03, 26.25seconds/s]
 64%|█████████████████████████████████████▍                    | 169.64999999999998/263.25 [00:13<00:03, 26.03seconds/s]
 67%|███████████████████████████████████████████████▎                       | 175.5/263.25 [00:14<00:03, 26.09seconds/s]
 69%|████████████████████████████████████████████████▏                     | 181.35/263.25 [00:14<00:03, 26.09seconds/s]
 71%|██████████████████████████████████████████████████▍                    | 187.2/263.25 [00:14<00:02, 26.24seconds/s]
 73%|██████████████████████████████████████████▌               | 193.04999999999998/263.25 [00:14<00:02, 26.35seconds/s]
 76%|███████████████████████████████████████████▊              | 198.89999999999998/263.25 [00:14<00:02, 26.32seconds/s]
 78%|██████████████████████████████████████████████████████▍               | 204.75/263.25 [00:15<00:02, 26.37seconds/s]
 80%|████████████████████████████████████████████████████████▊              | 210.6/263.25 [00:15<00:01, 26.42seconds/s]
 82%|█████████████████████████████████████████████████████████▌            | 216.45/263.25 [00:15<00:01, 26.38seconds/s]
 84%|████████████████████████████████████████████████▉         | 222.29999999999998/263.25 [00:15<00:01, 26.35seconds/s]
 87%|██████████████████████████████████████████████████▎       | 228.14999999999998/263.25 [00:16<00:01, 26.37seconds/s]
 89%|███████████████████████████████████████████████████████████████        | 234.0/263.25 [00:16<00:01, 26.20seconds/s]
 91%|███████████████████████████████████████████████████████████████▊      | 239.85/263.25 [00:16<00:00, 26.18seconds/s]
 93%|██████████████████████████████████████████████████████████████████▎    | 245.7/263.25 [00:16<00:00, 26.16seconds/s]
 96%|███████████████████████████████████████████████████████▍  | 251.54999999999998/263.25 [00:16<00:00, 26.18seconds/s]
 98%|█████████████████████████████████████████████████████████████████████▍ | 257.4/263.25 [00:17<00:00, 26.22seconds/s]
100%|██████████████████████████████████████████████████████████████████████| 263.25/263.25 [00:17<00:00, 26.22seconds/s]
100%|██████████████████████████████████████████████████████████████████████| 263.25/263.25 [00:17<00:00, 15.18seconds/s]
Important: the default model was recently changed to `htdemucs` the latest Hybrid Transformer Demucs model. In some cases, this model can actually perform worse than previous models. To get back the old default model use `-n mdx_extra_q`.
Selected model is a bag of 1 models. You will see that many progress bars per track.
Separated tracks will be stored in /src/separated/htdemucs
Separating track youtubeaudio/andrew_huberman.wav

This output was created using a different version of the model, zsxkib/create-rvc-dataset:f6593d27.

Run time and cost

This model costs approximately $0.061 to run on Replicate, or 16 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 63 seconds. The predict time for this model varies significantly based on the inputs.

Readme

RVC v2 Dataset Creation Tool

Introduction

Create vocal datasets for Realistic Voice Cloning (RVC) v2 models with ease. Simply provide a YouTube video URL and let the tool handle the extraction and preparation of vocal data, ideal for training sophisticated voice cloning models. 🧠🎤