zsxkib/realistic-voice-cloning:0a9c7c55 – Run with an API on Replicate

Version

You're looking at a specific version of this model. Jump to the model overview.

zsxkib /realistic-voice-cloning:0a9c7c55

Playground API

Input

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

song_input

file

Upload your audio file here.

rvc_model

string

RVC model for a specific voice. If using a custom model, this should match the name of the downloaded model. If a 'custom_rvc_model_download_url' is provided, this will be automatically set to the name of the downloaded model.

Default: "Squidward"

custom_rvc_model_download_url

string

Shift + Return to add a new line

https://huggingface.co/CxronaBxndit/Morgan-Freeman/resolve/main/Morgan-Freeman.ziphttps://huggingface.co/CxronaBxndit/Morgan-Freeman/resolve/main/Morgan-Freeman.zip

URL to download a custom RVC model. If provided, the model will be downloaded (if it doesn't already exist) and used for prediction, regardless of the 'rvc_model' value.

pitch_change

string

Adjust pitch of AI vocals. Options: `no-change`, `male-to-female`, `female-to-male`.

Default: "no-change"

index_rate

number

(minimum: 0, maximum: 1)

Control how much of the AI's accent to leave in the vocals.

Default: 0.5

filter_radius

integer

(minimum: 0, maximum: 7)

If >=3: apply median filtering median filtering to the harvested pitch results.

Default: 3

rms_mix_rate

number

(minimum: 0, maximum: 1)

Control how much to use the original vocal's loudness (0) or a fixed loudness (1).

Default: 0.25

pitch_detection_algorithm

string

Best option is rmvpe (clarity in vocals), then mangio-crepe (smoother vocals).

Default: "rmvpe"

crepe_hop_length

integer

When `pitch_detection_algo` is set to `mangio-crepe`, this controls how often it checks for pitch changes in milliseconds. Lower values lead to longer conversions and higher risk of voice cracks, but better pitch accuracy.

Default: 128

protect

number

(minimum: 0, maximum: 0.5)

Control how much of the original vocals' breath and voiceless consonants to leave in the AI vocals. Set 0.5 to disable.

Default: 0.33

main_vocals_volume_change

number

Control volume of main AI vocals. Use -3 to decrease the volume by 3 decibels, or 3 to increase the volume by 3 decibels.

Default: 0

backup_vocals_volume_change

number

Control volume of backup AI vocals.

Default: 0

instrumental_volume_change

number

Control volume of the background music/instrumentals.

Default: 0

pitch_change_all

number

Change pitch/key of background music, backup vocals and AI vocals in semitones. Reduces sound quality slightly.

Default: 0

reverb_size

number

(minimum: 0, maximum: 1)

The larger the room, the longer the reverb time.

Default: 0.15

reverb_wetness

number

(minimum: 0, maximum: 1)

Level of AI vocals with reverb.

Default: 0.2

reverb_dryness

number

(minimum: 0, maximum: 1)

Level of AI vocals without reverb.

Default: 0.8

reverb_damping

number

(minimum: 0, maximum: 1)

Absorption of high frequencies in the reverb.

Default: 0.7

output_format

string

wav for best quality and large file size, mp3 for decent quality and small file size.

Default: "mp3"

Run this model in Node.js with one line of code:

npx create-replicate --model=zsxkib/realistic-voice-cloning

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run zsxkib/realistic-voice-cloning using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "zsxkib/realistic-voice-cloning:0a9c7c558af4c0f20667c1bd1260ce32a2879944a0b9e44e1398660c077b1550",
  {
    input: {
      protect: 0.33,
      rvc_model: "CUSTOM",
      index_rate: 0.5,
      song_input: "https://replicate.delivery/pbxt/JyMOAadCqhOZxgTZ8ZQUTCERdoh26oGM2nIJP67lSLWGZQnd/silence-lambs-trimmed.mp3",
      reverb_size: 0.15,
      pitch_change: "no-change",
      rms_mix_rate: 0.25,
      filter_radius: 3,
      output_format: "mp3",
      reverb_damping: 0.7,
      reverb_dryness: 0.8,
      reverb_wetness: 0.2,
      crepe_hop_length: 128,
      pitch_change_all: 0,
      main_vocals_volume_change: 10,
      pitch_detection_algorithm: "rmvpe",
      instrumental_volume_change: 0,
      backup_vocals_volume_change: 0,
      custom_rvc_model_download_url: "https://huggingface.co/CxronaBxndit/Morgan-Freeman/resolve/main/Morgan-Freeman.zip"
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run zsxkib/realistic-voice-cloning using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "zsxkib/realistic-voice-cloning:0a9c7c558af4c0f20667c1bd1260ce32a2879944a0b9e44e1398660c077b1550",
    input={
        "protect": 0.33,
        "rvc_model": "CUSTOM",
        "index_rate": 0.5,
        "song_input": "https://replicate.delivery/pbxt/JyMOAadCqhOZxgTZ8ZQUTCERdoh26oGM2nIJP67lSLWGZQnd/silence-lambs-trimmed.mp3",
        "reverb_size": 0.15,
        "pitch_change": "no-change",
        "rms_mix_rate": 0.25,
        "filter_radius": 3,
        "output_format": "mp3",
        "reverb_damping": 0.7,
        "reverb_dryness": 0.8,
        "reverb_wetness": 0.2,
        "crepe_hop_length": 128,
        "pitch_change_all": 0,
        "main_vocals_volume_change": 10,
        "pitch_detection_algorithm": "rmvpe",
        "instrumental_volume_change": 0,
        "backup_vocals_volume_change": 0,
        "custom_rvc_model_download_url": "https://huggingface.co/CxronaBxndit/Morgan-Freeman/resolve/main/Morgan-Freeman.zip"
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run zsxkib/realistic-voice-cloning using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "0a9c7c558af4c0f20667c1bd1260ce32a2879944a0b9e44e1398660c077b1550",
    "input": {
      "protect": 0.33,
      "rvc_model": "CUSTOM",
      "index_rate": 0.5,
      "song_input": "https://replicate.delivery/pbxt/JyMOAadCqhOZxgTZ8ZQUTCERdoh26oGM2nIJP67lSLWGZQnd/silence-lambs-trimmed.mp3",
      "reverb_size": 0.15,
      "pitch_change": "no-change",
      "rms_mix_rate": 0.25,
      "filter_radius": 3,
      "output_format": "mp3",
      "reverb_damping": 0.7,
      "reverb_dryness": 0.8,
      "reverb_wetness": 0.2,
      "crepe_hop_length": 128,
      "pitch_change_all": 0,
      "main_vocals_volume_change": 10,
      "pitch_detection_algorithm": "rmvpe",
      "instrumental_volume_change": 0,
      "backup_vocals_volume_change": 0,
      "custom_rvc_model_download_url": "https://huggingface.co/CxronaBxndit/Morgan-Freeman/resolve/main/Morgan-Freeman.zip"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

{
  "completed_at": "2023-12-01T11:22:55.013215Z",
  "created_at": "2023-12-01T11:22:47.145351Z",
  "data_removed": false,
  "error": null,
  "id": "jgdnwyzby3bobcctcnm5ifo5y4",
  "input": {
    "protect": 0.33,
    "rvc_model": "CUSTOM",
    "index_rate": 0.5,
    "song_input": "https://replicate.delivery/pbxt/JyMOAadCqhOZxgTZ8ZQUTCERdoh26oGM2nIJP67lSLWGZQnd/silence-lambs-trimmed.mp3",
    "reverb_size": 0.15,
    "pitch_change": "no-change",
    "rms_mix_rate": 0.25,
    "filter_radius": 3,
    "output_format": "mp3",
    "reverb_damping": 0.7,
    "reverb_dryness": 0.8,
    "reverb_wetness": 0.2,
    "crepe_hop_length": 128,
    "pitch_change_all": 0,
    "main_vocals_volume_change": 10,
    "pitch_detection_algorithm": "rmvpe",
    "instrumental_volume_change": 0,
    "backup_vocals_volume_change": 0,
    "custom_rvc_model_download_url": "https://huggingface.co/CxronaBxndit/Morgan-Freeman/resolve/main/Morgan-Freeman.zip"
  },
  "logs": "[!] The model will be downloaded as 'Morgan-Freeman'.\n[~] Downloading voice model with name Morgan-Freeman...\nVoice model directory Morgan-Freeman already exists! Skipping download.\n[~] Starting AI Cover Generation Pipeline...\n[~] Converting voice using RVC...\n2023-12-01 11:22:47 | INFO | fairseq.tasks.hubert_pretraining | current directory is /src\n2023-12-01 11:22:47 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}\n2023-12-01 11:22:47 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}\ngin_channels: 256 self.spk_embed_dim: 109\n<All keys matched successfully>\n[~] Applying audio effects to Vocals...\n[~] Combining AI Vocals and Instrumentals...\n[~] Removing intermediate audio files...\n[+] Cover generated at /src/song_output/e4c95cc6b96/tmp3klghzvksilence-lambs-trimmed (Morgan-Freeman Ver).mp3",
  "metrics": {
    "predict_time": 7.850824,
    "total_time": 7.867864
  },
  "output": "https://replicate.delivery/pbxt/ILHNSgdwBeyvVKhbNNflKjKNv1g7Cmk6CAdFGKuW98TOQ39RA/tmp3klghzvksilence-lambs-trimmed%20%28Morgan-Freeman%20Ver%29.mp3",
  "started_at": "2023-12-01T11:22:47.162391Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/jgdnwyzby3bobcctcnm5ifo5y4",
    "cancel": "https://api.replicate.com/v1/predictions/jgdnwyzby3bobcctcnm5ifo5y4/cancel"
  },
  "version": "0a9c7c558af4c0f20667c1bd1260ce32a2879944a0b9e44e1398660c077b1550"
}

Generated in

7.9 seconds

Tweak itReport

[!] The model will be downloaded as 'Morgan-Freeman'.
[~] Downloading voice model with name Morgan-Freeman...
Voice model directory Morgan-Freeman already exists! Skipping download.
[~] Starting AI Cover Generation Pipeline...
[~] Converting voice using RVC...
2023-12-01 11:22:47 | INFO | fairseq.tasks.hubert_pretraining | current directory is /src
2023-12-01 11:22:47 | INFO | fairseq.tasks.hubert_pretraining | HubertPretrainingTask Config {'_name': 'hubert_pretraining', 'data': 'metadata', 'fine_tuning': False, 'labels': ['km'], 'label_dir': 'label', 'label_rate': 50.0, 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_keep_size': None, 'max_sample_size': 250000, 'min_sample_size': 32000, 'single_target': False, 'random_crop': True, 'pad_audio': False}
2023-12-01 11:22:47 | INFO | fairseq.models.hubert.hubert | HubertModel Config: {'_name': 'hubert', 'label_rate': 50.0, 'extractor_mode': default, 'encoder_layers': 12, 'encoder_embed_dim': 768, 'encoder_ffn_embed_dim': 3072, 'encoder_attention_heads': 12, 'activation_fn': gelu, 'layer_type': transformer, 'dropout': 0.1, 'attention_dropout': 0.1, 'activation_dropout': 0.0, 'encoder_layerdrop': 0.05, 'dropout_input': 0.1, 'dropout_features': 0.1, 'final_dim': 256, 'untie_final_proj': True, 'layer_norm_first': False, 'conv_feature_layers': '[(512,10,5)] + [(512,3,2)] * 4 + [(512,2,2)] * 2', 'conv_bias': False, 'logit_temp': 0.1, 'target_glu': False, 'feature_grad_mult': 0.1, 'mask_length': 10, 'mask_prob': 0.8, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 10, 'mask_channel_prob': 0.0, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'mask_channel_min_space': 1, 'conv_pos': 128, 'conv_pos_groups': 16, 'latent_temp': [2.0, 0.5, 0.999995], 'skip_masked': False, 'skip_nomask': False, 'checkpoint_activations': False, 'required_seq_len_multiple': 2, 'depthwise_conv_kernel_size': 31, 'attn_type': '', 'pos_enc_type': 'abs', 'fp16': False}
gin_channels: 256 self.spk_embed_dim: 109
<All keys matched successfully>
[~] Applying audio effects to Vocals...
[~] Combining AI Vocals and Instrumentals...
[~] Removing intermediate audio files...
[+] Cover generated at /src/song_output/e4c95cc6b96/tmp3klghzvksilence-lambs-trimmed (Morgan-Freeman Ver).mp3