Run time and cost

This model costs approximately $0.094 to run on Replicate, or 10 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 96 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Realistic Voice Cloning v2 (RVC v2) is a voice-to-voice model that can transform any input voice into a target voice. You can dive into it on RVC v2 Web UI on Replicate.

Now that you’ve crafted your dataset, the next phase is training the RVC model to mimic your target voice. Replicate offers two pathways for training: Train RVC Model Web UI or Colab Notebook.

To initiate training, you’ll need to provide several parameters:

dataset_zip: The URL or direct upload of your dataset zip file.
sample_rate: The audio sampling rate, which should typically be set to the default of 48k.
version: Opt between RVC v1 and v2, with v2 recommended for superior quality.
f0method: The method used to extract speech formants, with ‘rmvpe_gpu’ as the suggested default.
epoch: The number of complete passes over the dataset the model will perform.
batch_size: The number of data points processed in one optimization step; the default of 7 is optimal for most scenarios.

The training output will be a model packaged in a zip file if using the Web UI, or a URL to the trained model when using the API, similar to dataset creation.

Training Web UI Method

Visit the Train RVC Model on Replicate.
Upload your dataset zip file directly.
Optionally, configure the training parameters . We recommend sticking to the default values for sample_rate (48k), version (v2 for higher quality), f0method (rmvpe_gpu), epoch (the number of full dataset passes), and batch_size (7 is a good starting point).
Start the training. Once completed, you can download your trained model or copy the URL for inference.

Method 2: Using the API via Colab Notebook

For the API method, ensure you have the URL of your dataset ready. This could be from the output of the dataset creation step or obtained by re-uploading the dataset using the SERVING_URL command.

Open the Colab Notebook.
Run the training code cell, replacing "your_dataset_zip_url" with the actual URL of your dataset:

# Insert the URL of your dataset
training_output = replicate.run(
  "replicate/train-rvc-model:cf360587a27f67500c30fc31de1e0f0f9aa26dcd7b866e6ac937a07bd104bad9",
  input={
    "dataset_zip": "your_dataset_zip_url",
    "sample_rate": "48k",
    "version": "v2",
    "f0method": "rmvpe_gpu",
    "epoch": 50,
    "batch_size": 7
  }
)
print(training_output)

After the script completes, you’ll get a URL to your newly trained model, all set for the next stage: running inference.

Running Inference with Your Trained Model

With your RVC model now finely tuned, the final step is to put it to work by running inference. This is where your model starts cloning voices. The process differs slightly between using the Web UI and the API.

Setting Up for Inference

For both the Web UI and API, you’ll set rvc_model to CUSTOM. This tells the system you’re using a unique model tailored by you. You’ll also need the URL to your trained model. This should be the output URL from your model training step. If you’ve lost this URL, refer to the earlier section on how to re-upload and retrieve your model.

Inference via Web UI Method

Go to the Realistic Voice Cloning UI.
Upload your input audio file directly to the interface.
In the rvc_model field, select CUSTOM.
Paste the URL of your trained model into the custom_rvc_model_download_url field.
Configure additional parameters as needed to fine-tune the output.
Run the model and wait for the output. You’ll be able to play the cloned voice directly or download it.