lucataco / singing_voice_conversion

source_audio: Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000

1x
Chapters
descriptions off, selected
captions settings, opens captions settings dialog
captions off, selected
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Beginning of dialog window. Escape will cancel and close the window.
TextColorTransparency
BackgroundColorTransparency
WindowColorTransparency
Font Size
Text Edge Style
Font Family
End of dialog window.
target_singer: Taylor Swift
key_shift_mode: 0
pitch_shift_control: Auto Shift
diffusion_inference_steps: 1000

{
  "source_audio": "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav",
  "target_singer": "Taylor Swift",
  "key_shift_mode": 0,
  "pitch_shift_control": "Auto Shift",
  "diffusion_inference_steps": 1000
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b",
  {
    input: {
      source_audio: "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav",
      target_singer: "Taylor Swift",
      key_shift_mode: 0,
      pitch_shift_control: "Auto Shift",
      diffusion_inference_steps: 1000
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b",
    input={
        "source_audio": "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav",
        "target_singer": "Taylor Swift",
        "key_shift_mode": 0,
        "pitch_shift_control": "Auto Shift",
        "diffusion_inference_steps": 1000
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b",
    "input": {
      "source_audio": "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav",
      "target_singer": "Taylor Swift",
      "key_shift_mode": 0,
      "pitch_shift_control": "Auto Shift",
      "diffusion_inference_steps": 1000
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

You can run this model locally using Cog. First, install Cog:

brew install cog

If you don’t have Homebrew, there are other installation options available.

Run this to download the model and run it in your local environment:

cog predict r8.im/lucataco/singing_voice_conversion@sha256:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b \
  -i 'source_audio="https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav"' \
  -i 'target_singer="Taylor Swift"' \
  -i 'key_shift_mode=0' \
  -i 'pitch_shift_control="Auto Shift"' \
  -i 'diffusion_inference_steps=1000'

To learn more, take a look at the Cog documentation.

Run this to download the model and run it in your local environment:

docker run -d -p 5000:5000 --gpus=all r8.im/lucataco/singing_voice_conversion@sha256:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b
curl -s -X POST \
  -H "Content-Type: application/json" \
  -d $'{
    "input": {
      "source_audio": "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav",
      "target_singer": "Taylor Swift",
      "key_shift_mode": 0,
      "pitch_shift_control": "Auto Shift",
      "diffusion_inference_steps": 1000
    }
  }' \
  http://localhost:5000/predictions

To learn more, take a look at the Cog documentation.

Output

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

{
  "completed_at": "2023-12-21T22:37:49.201677Z",
  "created_at": "2023-12-21T22:35:12.670504Z",
  "data_removed": false,
  "error": null,
  "id": "h37sr5dbojsspt56c34pvjanoe",
  "input": {
    "source_audio": "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav",
    "target_singer": "Taylor Swift",
    "key_shift_mode": 0,
    "pitch_shift_control": "Auto Shift",
    "diffusion_inference_steps": 1000
  },
  "logs": "/tmp/input_audio\nvocalist_l1_TaylorSwift\nautoshift\ngetopt: unrecognized option '--diffusion_inference_steps'\nExprimental Configuration File: ckpts/svc/vocalist_l1_contentvec+whisper/args.json\nThe following values were not passed to `accelerate launch` and had defaults used instead:\n`--num_processes` was set to a value of `1`\n`--num_machines` was set to a value of `1`\n`--mixed_precision` was set to a value of `'no'`\n`--dynamo_backend` was set to a value of `'no'`\nTo avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.\nMonotonic align not found. Please make sure you have compiled it.\nThere are 1 source audios:\n**********\nConversion for source...\nPrepare for meta eval data: 0.0s\n  0%|          | 0/1 [00:00<?, ?it/s]\n  0%|          | 0/1 [00:00<?, ?it/s]\u001b[A\n100%|██████████| 1/1 [00:01<00:00,  1.98s/it]\u001b[A\n100%|██████████| 1/1 [00:01<00:00,  1.98s/it]\nPrepare for acoustic features: 2.0s\nPrepare for content features: 0.0s\n2023-12-21 22:37:31 | INFO | inference | ========================================================\n2023-12-21 22:37:31 | INFO | inference | ||\t\tNew inference process started.\t\t||\n2023-12-21 22:37:31 | INFO | inference | ========================================================\n2023-12-21 22:37:31 | INFO | inference |\n2023-12-21 22:37:31 | DEBUG | inference | Using DEBUG logging level.\n2023-12-21 22:37:31 | DEBUG | inference | Acoustic dir: ckpts/svc/vocalist_l1_contentvec+whisper\n2023-12-21 22:37:31 | DEBUG | inference | Vocoder dir: pretrained/bigvgan\n2023-12-21 22:37:31 | DEBUG | inference | Setting random seed done in 0.83ms\n2023-12-21 22:37:31 | DEBUG | inference | Random seed: 10086\n2023-12-21 22:37:31 | INFO | inference | Building dataset...\n2023-12-21 22:37:31 | INFO | inference | Building dataset done in 4.60ms\n2023-12-21 22:37:31 | INFO | inference | Building model...\n2023-12-21 22:37:31 | INFO | inference | Building model done in 276.183ms\n2023-12-21 22:37:31 | INFO | inference | Initializing accelerate...\n2023-12-21 22:37:32 | INFO | inference | Initializing accelerate done in 1057.268ms\n2023-12-21 22:37:32 | INFO | inference | Loading checkpoint...\n2023-12-21 22:37:32 | INFO | accelerate.accelerator | Loading states from ckpts/svc/vocalist_l1_contentvec+whisper/checkpoint/epoch-6852_step-0678447_loss-1.946773\n2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All model weights loaded successfully\n2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All optimizer states loaded successfully\n2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All scheduler states loaded successfully\n2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All dataloader sampler states loaded successfully\n2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All random states loaded successfully\n2023-12-21 22:37:32 | INFO | accelerate.accelerator | Loading in 0 custom states\n2023-12-21 22:37:32 | INFO | inference | Loading checkpoint done in 106.015ms\n2023-12-21 22:37:32 | INFO | inference | Using PNDM scheduler.\nModel Init: 1.5s\nAuto transposing: source f0 median = 372.9, target f0 median = 286.9, factor = 0.77\n  0%|          | 0/1009 [00:00<?, ?it/s]\u001b[A\n  0%|          | 1/1009 [00:02<39:02,  2.32s/it]\u001b[A\n  2%|▏         | 20/1009 [00:02<01:26, 11.39it/s]\u001b[A\n  4%|▍         | 39/1009 [00:02<00:38, 25.02it/s]\u001b[A\n  6%|▌         | 58/1009 [00:02<00:23, 41.23it/s]\u001b[A\n  8%|▊         | 77/1009 [00:02<00:15, 59.36it/s]\u001b[A\n 10%|▉         | 96/1009 [00:02<00:11, 78.69it/s]\u001b[A\n 11%|█▏        | 115/1009 [00:02<00:09, 98.15it/s]\u001b[A\n 13%|█▎        | 134/1009 [00:03<00:07, 116.43it/s]\u001b[A\n 15%|█▌        | 153/1009 [00:03<00:06, 132.68it/s]\u001b[A\n 17%|█▋        | 172/1009 [00:03<00:05, 146.29it/s]\u001b[A\n 19%|█▉        | 191/1009 [00:03<00:05, 157.09it/s]\u001b[A\n 21%|██        | 210/1009 [00:03<00:04, 164.64it/s]\u001b[A\n 23%|██▎       | 229/1009 [00:03<00:04, 170.24it/s]\u001b[A\n 25%|██▍       | 248/1009 [00:03<00:04, 174.54it/s]\u001b[A\n 26%|██▋       | 267/1009 [00:03<00:04, 176.83it/s]\u001b[A\n 28%|██▊       | 286/1009 [00:03<00:04, 178.76it/s]\u001b[A\n 30%|███       | 305/1009 [00:03<00:03, 180.21it/s]\u001b[A\n 32%|███▏      | 324/1009 [00:04<00:03, 179.83it/s]\u001b[A\n 34%|███▍      | 343/1009 [00:04<00:03, 179.98it/s]\u001b[A\n 36%|███▌      | 362/1009 [00:04<00:03, 181.63it/s]\u001b[A\n 38%|███▊      | 381/1009 [00:04<00:03, 181.04it/s]\u001b[A\n 40%|███▉      | 400/1009 [00:04<00:03, 182.13it/s]\u001b[A\n 42%|████▏     | 419/1009 [00:04<00:03, 182.37it/s]\u001b[A\n 43%|████▎     | 438/1009 [00:04<00:03, 183.85it/s]\u001b[A\n 45%|████▌     | 457/1009 [00:04<00:02, 184.98it/s]\u001b[A\n 47%|████▋     | 476/1009 [00:04<00:02, 185.91it/s]\u001b[A\n 49%|████▉     | 495/1009 [00:04<00:02, 186.25it/s]\u001b[A\n 51%|█████     | 514/1009 [00:05<00:02, 187.04it/s]\u001b[A\n 53%|█████▎    | 533/1009 [00:05<00:02, 187.69it/s]\u001b[A\n 55%|█████▍    | 552/1009 [00:05<00:02, 188.32it/s]\u001b[A\n 57%|█████▋    | 571/1009 [00:05<00:02, 188.13it/s]\u001b[A\n 58%|█████▊    | 590/1009 [00:05<00:02, 188.37it/s]\u001b[A\n 60%|██████    | 609/1009 [00:05<00:02, 188.66it/s]\u001b[A\n 62%|██████▏   | 628/1009 [00:05<00:02, 188.88it/s]\u001b[A\n 64%|██████▍   | 647/1009 [00:05<00:01, 188.97it/s]\u001b[A\n 66%|██████▌   | 666/1009 [00:05<00:01, 188.77it/s]\u001b[A\n 68%|██████▊   | 685/1009 [00:05<00:01, 188.38it/s]\u001b[A\n 70%|██████▉   | 704/1009 [00:06<00:01, 188.63it/s]\u001b[A\n 72%|███████▏  | 723/1009 [00:06<00:01, 188.84it/s]\u001b[A\n 74%|███████▎  | 742/1009 [00:06<00:01, 189.15it/s]\u001b[A\n 75%|███████▌  | 761/1009 [00:06<00:01, 188.98it/s]\u001b[A\n 77%|███████▋  | 780/1009 [00:06<00:01, 189.16it/s]\u001b[A\n 79%|███████▉  | 799/1009 [00:06<00:01, 186.62it/s]\u001b[A\n 81%|████████  | 819/1009 [00:06<00:01, 188.31it/s]\u001b[A\n 83%|████████▎ | 838/1009 [00:06<00:00, 185.22it/s]\u001b[A\n 85%|████████▍ | 857/1009 [00:06<00:00, 186.46it/s]\u001b[A\n 87%|████████▋ | 877/1009 [00:07<00:00, 188.15it/s]\u001b[A\n 89%|████████▉ | 897/1009 [00:07<00:00, 188.85it/s]\u001b[A\n 91%|█████████ | 917/1009 [00:07<00:00, 189.80it/s]\u001b[A\n 93%|█████████▎| 937/1009 [00:07<00:00, 190.56it/s]\u001b[A\n 95%|█████████▍| 957/1009 [00:07<00:00, 190.87it/s]\u001b[A\n 97%|█████████▋| 977/1009 [00:07<00:00, 191.29it/s]\u001b[A\n 99%|█████████▉| 997/1009 [00:07<00:00, 190.46it/s]\u001b[A\n100%|██████████| 1009/1009 [00:07<00:00, 130.99it/s]\nSynthesis audios using bigvgan vocoder...\nLoading Vocoder from Weights file: /src/Amphion/pretrained/bigvgan/400000.pt\nFor predicted mels, #sample = 1...\nModel inference: 14.1s\n100%|██████████| 1/1 [00:17<00:00, 17.56s/it]\n100%|██████████| 1/1 [00:17<00:00, 17.56s/it]\n/src/Amphion/result/source/source_vocalist_l1_TaylorSwift.wav",
  "metrics": {
    "predict_time": 31.512607,
    "total_time": 156.531173
  },
  "output": "https://replicate.delivery/pbxt/FoHobqVw0mrPOluLgRQGEW01GwDo5tvSefKNDyc79hY8AnESA/source_vocalist_l1_TaylorSwift.wav",
  "started_at": "2023-12-21T22:37:17.689070Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/h37sr5dbojsspt56c34pvjanoe",
    "cancel": "https://api.replicate.com/v1/predictions/h37sr5dbojsspt56c34pvjanoe/cancel"
  },
  "version": "f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b"
}

Generated in

31.5 seconds

Tweak it ShareReport

/tmp/input_audio
vocalist_l1_TaylorSwift
autoshift
getopt: unrecognized option '--diffusion_inference_steps'
Exprimental Configuration File: ckpts/svc/vocalist_l1_contentvec+whisper/args.json
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `1`
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Monotonic align not found. Please make sure you have compiled it.
There are 1 source audios:
**********
Conversion for source...
Prepare for meta eval data: 0.0s
  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:01<00:00,  1.98s/it]
100%|██████████| 1/1 [00:01<00:00,  1.98s/it]
Prepare for acoustic features: 2.0s
Prepare for content features: 0.0s
2023-12-21 22:37:31 | INFO | inference | ========================================================
2023-12-21 22:37:31 | INFO | inference | ||		New inference process started.		||
2023-12-21 22:37:31 | INFO | inference | ========================================================
2023-12-21 22:37:31 | INFO | inference |
2023-12-21 22:37:31 | DEBUG | inference | Using DEBUG logging level.
2023-12-21 22:37:31 | DEBUG | inference | Acoustic dir: ckpts/svc/vocalist_l1_contentvec+whisper
2023-12-21 22:37:31 | DEBUG | inference | Vocoder dir: pretrained/bigvgan
2023-12-21 22:37:31 | DEBUG | inference | Setting random seed done in 0.83ms
2023-12-21 22:37:31 | DEBUG | inference | Random seed: 10086
2023-12-21 22:37:31 | INFO | inference | Building dataset...
2023-12-21 22:37:31 | INFO | inference | Building dataset done in 4.60ms
2023-12-21 22:37:31 | INFO | inference | Building model...
2023-12-21 22:37:31 | INFO | inference | Building model done in 276.183ms
2023-12-21 22:37:31 | INFO | inference | Initializing accelerate...
2023-12-21 22:37:32 | INFO | inference | Initializing accelerate done in 1057.268ms
2023-12-21 22:37:32 | INFO | inference | Loading checkpoint...
2023-12-21 22:37:32 | INFO | accelerate.accelerator | Loading states from ckpts/svc/vocalist_l1_contentvec+whisper/checkpoint/epoch-6852_step-0678447_loss-1.946773
2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All model weights loaded successfully
2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All optimizer states loaded successfully
2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All scheduler states loaded successfully
2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All dataloader sampler states loaded successfully
2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All random states loaded successfully
2023-12-21 22:37:32 | INFO | accelerate.accelerator | Loading in 0 custom states
2023-12-21 22:37:32 | INFO | inference | Loading checkpoint done in 106.015ms
2023-12-21 22:37:32 | INFO | inference | Using PNDM scheduler.
Model Init: 1.5s
Auto transposing: source f0 median = 372.9, target f0 median = 286.9, factor = 0.77
  0%|          | 0/1009 [00:00<?, ?it/s]
  0%|          | 1/1009 [00:02<39:02,  2.32s/it]
  2%|▏         | 20/1009 [00:02<01:26, 11.39it/s]
  4%|▍         | 39/1009 [00:02<00:38, 25.02it/s]
  6%|▌         | 58/1009 [00:02<00:23, 41.23it/s]
  8%|▊         | 77/1009 [00:02<00:15, 59.36it/s]
 10%|▉         | 96/1009 [00:02<00:11, 78.69it/s]
 11%|█▏        | 115/1009 [00:02<00:09, 98.15it/s]
 13%|█▎        | 134/1009 [00:03<00:07, 116.43it/s]
 15%|█▌        | 153/1009 [00:03<00:06, 132.68it/s]
 17%|█▋        | 172/1009 [00:03<00:05, 146.29it/s]
 19%|█▉        | 191/1009 [00:03<00:05, 157.09it/s]
 21%|██        | 210/1009 [00:03<00:04, 164.64it/s]
 23%|██▎       | 229/1009 [00:03<00:04, 170.24it/s]
 25%|██▍       | 248/1009 [00:03<00:04, 174.54it/s]
 26%|██▋       | 267/1009 [00:03<00:04, 176.83it/s]
 28%|██▊       | 286/1009 [00:03<00:04, 178.76it/s]
 30%|███       | 305/1009 [00:03<00:03, 180.21it/s]
 32%|███▏      | 324/1009 [00:04<00:03, 179.83it/s]
 34%|███▍      | 343/1009 [00:04<00:03, 179.98it/s]
 36%|███▌      | 362/1009 [00:04<00:03, 181.63it/s]
 38%|███▊      | 381/1009 [00:04<00:03, 181.04it/s]
 40%|███▉      | 400/1009 [00:04<00:03, 182.13it/s]
 42%|████▏     | 419/1009 [00:04<00:03, 182.37it/s]
 43%|████▎     | 438/1009 [00:04<00:03, 183.85it/s]
 45%|████▌     | 457/1009 [00:04<00:02, 184.98it/s]
 47%|████▋     | 476/1009 [00:04<00:02, 185.91it/s]
 49%|████▉     | 495/1009 [00:04<00:02, 186.25it/s]
 51%|█████     | 514/1009 [00:05<00:02, 187.04it/s]
 53%|█████▎    | 533/1009 [00:05<00:02, 187.69it/s]
 55%|█████▍    | 552/1009 [00:05<00:02, 188.32it/s]
 57%|█████▋    | 571/1009 [00:05<00:02, 188.13it/s]
 58%|█████▊    | 590/1009 [00:05<00:02, 188.37it/s]
 60%|██████    | 609/1009 [00:05<00:02, 188.66it/s]
 62%|██████▏   | 628/1009 [00:05<00:02, 188.88it/s]
 64%|██████▍   | 647/1009 [00:05<00:01, 188.97it/s]
 66%|██████▌   | 666/1009 [00:05<00:01, 188.77it/s]
 68%|██████▊   | 685/1009 [00:05<00:01, 188.38it/s]
 70%|██████▉   | 704/1009 [00:06<00:01, 188.63it/s]
 72%|███████▏  | 723/1009 [00:06<00:01, 188.84it/s]
 74%|███████▎  | 742/1009 [00:06<00:01, 189.15it/s]
 75%|███████▌  | 761/1009 [00:06<00:01, 188.98it/s]
 77%|███████▋  | 780/1009 [00:06<00:01, 189.16it/s]
 79%|███████▉  | 799/1009 [00:06<00:01, 186.62it/s]
 81%|████████  | 819/1009 [00:06<00:01, 188.31it/s]
 83%|████████▎ | 838/1009 [00:06<00:00, 185.22it/s]
 85%|████████▍ | 857/1009 [00:06<00:00, 186.46it/s]
 87%|████████▋ | 877/1009 [00:07<00:00, 188.15it/s]
 89%|████████▉ | 897/1009 [00:07<00:00, 188.85it/s]
 91%|█████████ | 917/1009 [00:07<00:00, 189.80it/s]
 93%|█████████▎| 937/1009 [00:07<00:00, 190.56it/s]
 95%|█████████▍| 957/1009 [00:07<00:00, 190.87it/s]
 97%|█████████▋| 977/1009 [00:07<00:00, 191.29it/s]
 99%|█████████▉| 997/1009 [00:07<00:00, 190.46it/s]
100%|██████████| 1009/1009 [00:07<00:00, 130.99it/s]
Synthesis audios using bigvgan vocoder...
Loading Vocoder from Weights file: /src/Amphion/pretrained/bigvgan/400000.pt
For predicted mels, #sample = 1...
Model inference: 14.1s
100%|██████████| 1/1 [00:17<00:00, 17.56s/it]
100%|██████████| 1/1 [00:17<00:00, 17.56s/it]
/src/Amphion/result/source/source_vocalist_l1_TaylorSwift.wav

Prediction

lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b

Model

lucataco/singing_voice_conversion:f29872ee

ujqoollbrkifg2tmynr4rl2xuu

Status

Succeeded

Source

Web

Hardware

A40 (Large)

Total duration

29.2s

Created

over 1 year ago

Input

source_audio: Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000

1x
Chapters
descriptions off, selected
captions settings, opens captions settings dialog
captions off, selected
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Beginning of dialog window. Escape will cancel and close the window.
TextColorTransparency
BackgroundColorTransparency
WindowColorTransparency
Font Size
Text Edge Style
Font Family
End of dialog window.
target_singer: Beyonce
key_shift_mode: 0
pitch_shift_control: Auto Shift
diffusion_inference_steps: 1000

{
  "source_audio": "https://replicate.delivery/pbxt/K5cr2RsQxEgBXEYincFUWhma8aGy0qWvh3vSFXjv9iqKh0wM/adele.wav",
  "target_singer": "Beyonce",
  "key_shift_mode": 0,
  "pitch_shift_control": "Auto Shift",
  "diffusion_inference_steps": 1000
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b",
  {
    input: {
      source_audio: "https://replicate.delivery/pbxt/K5cr2RsQxEgBXEYincFUWhma8aGy0qWvh3vSFXjv9iqKh0wM/adele.wav",
      target_singer: "Beyonce",
      key_shift_mode: 0,
      pitch_shift_control: "Auto Shift",
      diffusion_inference_steps: 1000
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b",
    input={
        "source_audio": "https://replicate.delivery/pbxt/K5cr2RsQxEgBXEYincFUWhma8aGy0qWvh3vSFXjv9iqKh0wM/adele.wav",
        "target_singer": "Beyonce",
        "key_shift_mode": 0,
        "pitch_shift_control": "Auto Shift",
        "diffusion_inference_steps": 1000
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b",
    "input": {
      "source_audio": "https://replicate.delivery/pbxt/K5cr2RsQxEgBXEYincFUWhma8aGy0qWvh3vSFXjv9iqKh0wM/adele.wav",
      "target_singer": "Beyonce",
      "key_shift_mode": 0,
      "pitch_shift_control": "Auto Shift",
      "diffusion_inference_steps": 1000
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

You can run this model locally using Cog. First, install Cog:

brew install cog

If you don’t have Homebrew, there are other installation options available.

Run this to download the model and run it in your local environment:

cog predict r8.im/lucataco/singing_voice_conversion@sha256:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b \
  -i 'source_audio="https://replicate.delivery/pbxt/K5cr2RsQxEgBXEYincFUWhma8aGy0qWvh3vSFXjv9iqKh0wM/adele.wav"' \
  -i 'target_singer="Beyonce"' \
  -i 'key_shift_mode=0' \
  -i 'pitch_shift_control="Auto Shift"' \
  -i 'diffusion_inference_steps=1000'

To learn more, take a look at the Cog documentation.

Run this to download the model and run it in your local environment:

docker run -d -p 5000:5000 --gpus=all r8.im/lucataco/singing_voice_conversion@sha256:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b
curl -s -X POST \
  -H "Content-Type: application/json" \
  -d $'{
    "input": {
      "source_audio": "https://replicate.delivery/pbxt/K5cr2RsQxEgBXEYincFUWhma8aGy0qWvh3vSFXjv9iqKh0wM/adele.wav",
      "target_singer": "Beyonce",
      "key_shift_mode": 0,
      "pitch_shift_control": "Auto Shift",
      "diffusion_inference_steps": 1000
    }
  }' \
  http://localhost:5000/predictions

To learn more, take a look at the Cog documentation.

Output

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

{
  "completed_at": "2023-12-21T22:38:31.138680Z",
  "created_at": "2023-12-21T22:38:01.978638Z",
  "data_removed": false,
  "error": null,
  "id": "ujqoollbrkifg2tmynr4rl2xuu",
  "input": {
    "source_audio": "https://replicate.delivery/pbxt/K5cr2RsQxEgBXEYincFUWhma8aGy0qWvh3vSFXjv9iqKh0wM/adele.wav",
    "target_singer": "Beyonce",
    "key_shift_mode": 0,
    "pitch_shift_control": "Auto Shift",
    "diffusion_inference_steps": 1000
  },
  "logs": "/tmp/input_audio\nvocalist_l1_Beyonce\nautoshift\ngetopt: unrecognized option '--diffusion_inference_steps'\nExprimental Configuration File: ckpts/svc/vocalist_l1_contentvec+whisper/args.json\nThe following values were not passed to `accelerate launch` and had defaults used instead:\n`--num_processes` was set to a value of `1`\n`--num_machines` was set to a value of `1`\n`--mixed_precision` was set to a value of `'no'`\n`--dynamo_backend` was set to a value of `'no'`\nTo avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.\nMonotonic align not found. Please make sure you have compiled it.\nThere are 1 source audios:\n**********\nConversion for source...\nPrepare for meta eval data: 0.0s\n  0%|          | 0/1 [00:00<?, ?it/s]\n  0%|          | 0/1 [00:00<?, ?it/s]\u001b[A\n100%|██████████| 1/1 [00:00<00:00,  1.93it/s]\u001b[A\n100%|██████████| 1/1 [00:00<00:00,  1.93it/s]\nPrepare for acoustic features: 0.5s\nPrepare for content features: 0.0s\n2023-12-21 22:38:14 | INFO | inference | ========================================================\n2023-12-21 22:38:14 | INFO | inference | ||\t\tNew inference process started.\t\t||\n2023-12-21 22:38:14 | INFO | inference | ========================================================\n2023-12-21 22:38:14 | INFO | inference |\n2023-12-21 22:38:14 | DEBUG | inference | Using DEBUG logging level.\n2023-12-21 22:38:14 | DEBUG | inference | Acoustic dir: ckpts/svc/vocalist_l1_contentvec+whisper\n2023-12-21 22:38:14 | DEBUG | inference | Vocoder dir: pretrained/bigvgan\n2023-12-21 22:38:14 | DEBUG | inference | Setting random seed done in 0.82ms\n2023-12-21 22:38:14 | DEBUG | inference | Random seed: 10086\n2023-12-21 22:38:14 | INFO | inference | Building dataset...\n2023-12-21 22:38:14 | INFO | inference | Building dataset done in 4.43ms\n2023-12-21 22:38:14 | INFO | inference | Building model...\n2023-12-21 22:38:14 | INFO | inference | Building model done in 275.277ms\n2023-12-21 22:38:14 | INFO | inference | Initializing accelerate...\n2023-12-21 22:38:15 | INFO | inference | Initializing accelerate done in 1093.010ms\n2023-12-21 22:38:15 | INFO | inference | Loading checkpoint...\n2023-12-21 22:38:15 | INFO | accelerate.accelerator | Loading states from ckpts/svc/vocalist_l1_contentvec+whisper/checkpoint/epoch-6852_step-0678447_loss-1.946773\n2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All model weights loaded successfully\n2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All optimizer states loaded successfully\n2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All scheduler states loaded successfully\n2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All dataloader sampler states loaded successfully\n2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All random states loaded successfully\n2023-12-21 22:38:15 | INFO | accelerate.accelerator | Loading in 0 custom states\n2023-12-21 22:38:15 | INFO | inference | Loading checkpoint done in 99.938ms\n2023-12-21 22:38:15 | INFO | inference | Using PNDM scheduler.\nModel Init: 1.5s\nAuto transposing: source f0 median = 372.9, target f0 median = 318.3, factor = 0.85\n  0%|          | 0/1009 [00:00<?, ?it/s]\u001b[A\n  0%|          | 1/1009 [00:00<10:21,  1.62it/s]\u001b[A\n  2%|▏         | 20/1009 [00:00<00:26, 36.95it/s]\u001b[A\n  4%|▍         | 39/1009 [00:00<00:14, 69.13it/s]\u001b[A\n  6%|▌         | 58/1009 [00:00<00:09, 96.78it/s]\u001b[A\n  8%|▊         | 77/1009 [00:01<00:07, 118.74it/s]\u001b[A\n 10%|▉         | 96/1009 [00:01<00:06, 136.25it/s]\u001b[A\n 11%|█▏        | 115/1009 [00:01<00:05, 150.28it/s]\u001b[A\n 13%|█▎        | 135/1009 [00:01<00:05, 161.83it/s]\u001b[A\n 15%|█▌        | 155/1009 [00:01<00:05, 170.42it/s]\u001b[A\n 17%|█▋        | 175/1009 [00:01<00:04, 176.38it/s]\u001b[A\n 19%|█▉        | 195/1009 [00:01<00:04, 180.68it/s]\u001b[A\n 21%|██        | 214/1009 [00:01<00:04, 183.11it/s]\u001b[A\n 23%|██▎       | 233/1009 [00:01<00:04, 184.90it/s]\u001b[A\n 25%|██▌       | 253/1009 [00:01<00:04, 186.84it/s]\u001b[A\n 27%|██▋       | 272/1009 [00:02<00:03, 187.40it/s]\u001b[A\n 29%|██▉       | 292/1009 [00:02<00:03, 188.43it/s]\u001b[A\n 31%|███       | 311/1009 [00:02<00:03, 188.83it/s]\u001b[A\n 33%|███▎      | 331/1009 [00:02<00:03, 189.21it/s]\u001b[A\n 35%|███▍      | 351/1009 [00:02<00:03, 189.52it/s]\u001b[A\n 37%|███▋      | 371/1009 [00:02<00:03, 189.85it/s]\u001b[A\n 39%|███▉      | 391/1009 [00:02<00:03, 190.24it/s]\u001b[A\n 41%|████      | 411/1009 [00:02<00:03, 190.48it/s]\u001b[A\n 43%|████▎     | 431/1009 [00:02<00:03, 190.21it/s]\u001b[A\n 45%|████▍     | 451/1009 [00:02<00:02, 190.62it/s]\u001b[A\n 47%|████▋     | 471/1009 [00:03<00:02, 190.39it/s]\u001b[A\n 49%|████▊     | 491/1009 [00:03<00:02, 190.36it/s]\u001b[A\n 51%|█████     | 511/1009 [00:03<00:02, 189.68it/s]\u001b[A\n 53%|█████▎    | 530/1009 [00:03<00:02, 186.46it/s]\u001b[A\n 54%|█████▍    | 549/1009 [00:03<00:02, 184.33it/s]\u001b[A\n 56%|█████▋    | 568/1009 [00:03<00:02, 183.14it/s]\u001b[A\n 58%|█████▊    | 587/1009 [00:03<00:02, 181.72it/s]\u001b[A\n 60%|██████    | 606/1009 [00:03<00:02, 181.17it/s]\u001b[A\n 62%|██████▏   | 625/1009 [00:03<00:02, 181.36it/s]\u001b[A\n 64%|██████▍   | 644/1009 [00:04<00:02, 181.32it/s]\u001b[A\n 66%|██████▌   | 663/1009 [00:04<00:01, 181.26it/s]\u001b[A\n 68%|██████▊   | 682/1009 [00:04<00:01, 181.11it/s]\u001b[A\n 69%|██████▉   | 701/1009 [00:04<00:01, 181.30it/s]\u001b[A\n 71%|███████▏  | 720/1009 [00:04<00:01, 181.46it/s]\u001b[A\n 73%|███████▎  | 739/1009 [00:04<00:01, 181.68it/s]\u001b[A\n 75%|███████▌  | 759/1009 [00:04<00:01, 184.70it/s]\u001b[A\n 77%|███████▋  | 779/1009 [00:04<00:01, 186.43it/s]\u001b[A\n 79%|███████▉  | 799/1009 [00:04<00:01, 187.97it/s]\u001b[A\n 81%|████████  | 818/1009 [00:04<00:01, 185.71it/s]\u001b[A\n 83%|████████▎ | 837/1009 [00:05<00:00, 184.44it/s]\u001b[A\n 85%|████████▍ | 856/1009 [00:05<00:00, 183.59it/s]\u001b[A\n 87%|████████▋ | 875/1009 [00:05<00:00, 182.91it/s]\u001b[A\n 89%|████████▊ | 894/1009 [00:05<00:00, 181.95it/s]\u001b[A\n 91%|█████████ | 914/1009 [00:05<00:00, 184.87it/s]\u001b[A\n 93%|█████████▎| 934/1009 [00:05<00:00, 186.66it/s]\u001b[A\n 94%|█████████▍| 953/1009 [00:05<00:00, 186.56it/s]\u001b[A\n 96%|█████████▋| 972/1009 [00:05<00:00, 183.50it/s]\u001b[A\n 98%|█████████▊| 991/1009 [00:05<00:00, 182.73it/s]\u001b[A\n100%|██████████| 1009/1009 [00:06<00:00, 167.18it/s]\nSynthesis audios using bigvgan vocoder...\nLoading Vocoder from Weights file: /src/Amphion/pretrained/bigvgan/400000.pt\nFor predicted mels, #sample = 1...\nModel inference: 12.8s\n100%|██████████| 1/1 [00:14<00:00, 14.84s/it]\n100%|██████████| 1/1 [00:14<00:00, 14.84s/it]\n/src/Amphion/result/source/source_vocalist_l1_Beyonce.wav",
  "metrics": {
    "predict_time": 29.123041,
    "total_time": 29.160042
  },
  "output": "https://replicate.delivery/pbxt/Au7w7BH5kx4VA1j6zZLcGBllIre31A0uw80Cc0Bk0kczgTCJA/source_vocalist_l1_Beyonce.wav",
  "started_at": "2023-12-21T22:38:02.015639Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/ujqoollbrkifg2tmynr4rl2xuu",
    "cancel": "https://api.replicate.com/v1/predictions/ujqoollbrkifg2tmynr4rl2xuu/cancel"
  },
  "version": "f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b"
}

Generated in

29.1 seconds

Tweak it ShareReport

/tmp/input_audio
vocalist_l1_Beyonce
autoshift
getopt: unrecognized option '--diffusion_inference_steps'
Exprimental Configuration File: ckpts/svc/vocalist_l1_contentvec+whisper/args.json
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `1`
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Monotonic align not found. Please make sure you have compiled it.
There are 1 source audios:
**********
Conversion for source...
Prepare for meta eval data: 0.0s
  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00,  1.93it/s]
100%|██████████| 1/1 [00:00<00:00,  1.93it/s]
Prepare for acoustic features: 0.5s
Prepare for content features: 0.0s
2023-12-21 22:38:14 | INFO | inference | ========================================================
2023-12-21 22:38:14 | INFO | inference | ||		New inference process started.		||
2023-12-21 22:38:14 | INFO | inference | ========================================================
2023-12-21 22:38:14 | INFO | inference |
2023-12-21 22:38:14 | DEBUG | inference | Using DEBUG logging level.
2023-12-21 22:38:14 | DEBUG | inference | Acoustic dir: ckpts/svc/vocalist_l1_contentvec+whisper
2023-12-21 22:38:14 | DEBUG | inference | Vocoder dir: pretrained/bigvgan
2023-12-21 22:38:14 | DEBUG | inference | Setting random seed done in 0.82ms
2023-12-21 22:38:14 | DEBUG | inference | Random seed: 10086
2023-12-21 22:38:14 | INFO | inference | Building dataset...
2023-12-21 22:38:14 | INFO | inference | Building dataset done in 4.43ms
2023-12-21 22:38:14 | INFO | inference | Building model...
2023-12-21 22:38:14 | INFO | inference | Building model done in 275.277ms
2023-12-21 22:38:14 | INFO | inference | Initializing accelerate...
2023-12-21 22:38:15 | INFO | inference | Initializing accelerate done in 1093.010ms
2023-12-21 22:38:15 | INFO | inference | Loading checkpoint...
2023-12-21 22:38:15 | INFO | accelerate.accelerator | Loading states from ckpts/svc/vocalist_l1_contentvec+whisper/checkpoint/epoch-6852_step-0678447_loss-1.946773
2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All model weights loaded successfully
2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All optimizer states loaded successfully
2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All scheduler states loaded successfully
2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All dataloader sampler states loaded successfully
2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All random states loaded successfully
2023-12-21 22:38:15 | INFO | accelerate.accelerator | Loading in 0 custom states
2023-12-21 22:38:15 | INFO | inference | Loading checkpoint done in 99.938ms
2023-12-21 22:38:15 | INFO | inference | Using PNDM scheduler.
Model Init: 1.5s
Auto transposing: source f0 median = 372.9, target f0 median = 318.3, factor = 0.85
  0%|          | 0/1009 [00:00<?, ?it/s]
  0%|          | 1/1009 [00:00<10:21,  1.62it/s]
  2%|▏         | 20/1009 [00:00<00:26, 36.95it/s]
  4%|▍         | 39/1009 [00:00<00:14, 69.13it/s]
  6%|▌         | 58/1009 [00:00<00:09, 96.78it/s]
  8%|▊         | 77/1009 [00:01<00:07, 118.74it/s]
 10%|▉         | 96/1009 [00:01<00:06, 136.25it/s]
 11%|█▏        | 115/1009 [00:01<00:05, 150.28it/s]
 13%|█▎        | 135/1009 [00:01<00:05, 161.83it/s]
 15%|█▌        | 155/1009 [00:01<00:05, 170.42it/s]
 17%|█▋        | 175/1009 [00:01<00:04, 176.38it/s]
 19%|█▉        | 195/1009 [00:01<00:04, 180.68it/s]
 21%|██        | 214/1009 [00:01<00:04, 183.11it/s]
 23%|██▎       | 233/1009 [00:01<00:04, 184.90it/s]
 25%|██▌       | 253/1009 [00:01<00:04, 186.84it/s]
 27%|██▋       | 272/1009 [00:02<00:03, 187.40it/s]
 29%|██▉       | 292/1009 [00:02<00:03, 188.43it/s]
 31%|███       | 311/1009 [00:02<00:03, 188.83it/s]
 33%|███▎      | 331/1009 [00:02<00:03, 189.21it/s]
 35%|███▍      | 351/1009 [00:02<00:03, 189.52it/s]
 37%|███▋      | 371/1009 [00:02<00:03, 189.85it/s]
 39%|███▉      | 391/1009 [00:02<00:03, 190.24it/s]
 41%|████      | 411/1009 [00:02<00:03, 190.48it/s]
 43%|████▎     | 431/1009 [00:02<00:03, 190.21it/s]
 45%|████▍     | 451/1009 [00:02<00:02, 190.62it/s]
 47%|████▋     | 471/1009 [00:03<00:02, 190.39it/s]
 49%|████▊     | 491/1009 [00:03<00:02, 190.36it/s]
 51%|█████     | 511/1009 [00:03<00:02, 189.68it/s]
 53%|█████▎    | 530/1009 [00:03<00:02, 186.46it/s]
 54%|█████▍    | 549/1009 [00:03<00:02, 184.33it/s]
 56%|█████▋    | 568/1009 [00:03<00:02, 183.14it/s]
 58%|█████▊    | 587/1009 [00:03<00:02, 181.72it/s]
 60%|██████    | 606/1009 [00:03<00:02, 181.17it/s]
 62%|██████▏   | 625/1009 [00:03<00:02, 181.36it/s]
 64%|██████▍   | 644/1009 [00:04<00:02, 181.32it/s]
 66%|██████▌   | 663/1009 [00:04<00:01, 181.26it/s]
 68%|██████▊   | 682/1009 [00:04<00:01, 181.11it/s]
 69%|██████▉   | 701/1009 [00:04<00:01, 181.30it/s]
 71%|███████▏  | 720/1009 [00:04<00:01, 181.46it/s]
 73%|███████▎  | 739/1009 [00:04<00:01, 181.68it/s]
 75%|███████▌  | 759/1009 [00:04<00:01, 184.70it/s]
 77%|███████▋  | 779/1009 [00:04<00:01, 186.43it/s]
 79%|███████▉  | 799/1009 [00:04<00:01, 187.97it/s]
 81%|████████  | 818/1009 [00:04<00:01, 185.71it/s]
 83%|████████▎ | 837/1009 [00:05<00:00, 184.44it/s]
 85%|████████▍ | 856/1009 [00:05<00:00, 183.59it/s]
 87%|████████▋ | 875/1009 [00:05<00:00, 182.91it/s]
 89%|████████▊ | 894/1009 [00:05<00:00, 181.95it/s]
 91%|█████████ | 914/1009 [00:05<00:00, 184.87it/s]
 93%|█████████▎| 934/1009 [00:05<00:00, 186.66it/s]
 94%|█████████▍| 953/1009 [00:05<00:00, 186.56it/s]
 96%|█████████▋| 972/1009 [00:05<00:00, 183.50it/s]
 98%|█████████▊| 991/1009 [00:05<00:00, 182.73it/s]
100%|██████████| 1009/1009 [00:06<00:00, 167.18it/s]
Synthesis audios using bigvgan vocoder...
Loading Vocoder from Weights file: /src/Amphion/pretrained/bigvgan/400000.pt
For predicted mels, #sample = 1...
Model inference: 12.8s
100%|██████████| 1/1 [00:14<00:00, 14.84s/it]
100%|██████████| 1/1 [00:14<00:00, 14.84s/it]
/src/Amphion/result/source/source_vocalist_l1_Beyonce.wav

Prediction

lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b

Model

lucataco/singing_voice_conversion:f29872ee

sm5odfdb53siq7x4sduavb2qfu

Status

Succeeded

Source

Web

Hardware

A40 (Large)

Total duration

23.5s

Created

over 1 year ago

Input

source_audio: Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000

1x
Chapters
descriptions off, selected
captions settings, opens captions settings dialog
captions off, selected
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.
Beginning of dialog window. Escape will cancel and close the window.
TextColorTransparency
BackgroundColorTransparency
WindowColorTransparency
Font Size
Text Edge Style
Font Family
End of dialog window.
target_singer: Bruno Mars
key_shift_mode: 0
pitch_shift_control: Auto Shift
diffusion_inference_steps: 1000

{
  "source_audio": "https://replicate.delivery/pbxt/K5cs0E3Ap10wLf1vuZKTwxx7kbw7Z2OJsmUJQRRsZb01Nzys/adele.wav",
  "target_singer": "Bruno Mars",
  "key_shift_mode": 0,
  "pitch_shift_control": "Auto Shift",
  "diffusion_inference_steps": 1000
}

Install Replicate’s Node.js client library:

npm install replicate

Import and set up the client:

import Replicate from "replicate";
import fs from "node:fs";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b",
  {
    input: {
      source_audio: "https://replicate.delivery/pbxt/K5cs0E3Ap10wLf1vuZKTwxx7kbw7Z2OJsmUJQRRsZb01Nzys/adele.wav",
      target_singer: "Bruno Mars",
      key_shift_mode: 0,
      pitch_shift_control: "Auto Shift",
      diffusion_inference_steps: 1000
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Import the client:

import replicate

Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b",
    input={
        "source_audio": "https://replicate.delivery/pbxt/K5cs0E3Ap10wLf1vuZKTwxx7kbw7Z2OJsmUJQRRsZb01Nzys/adele.wav",
        "target_singer": "Bruno Mars",
        "key_shift_mode": 0,
        "pitch_shift_control": "Auto Shift",
        "diffusion_inference_steps": 1000
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b",
    "input": {
      "source_audio": "https://replicate.delivery/pbxt/K5cs0E3Ap10wLf1vuZKTwxx7kbw7Z2OJsmUJQRRsZb01Nzys/adele.wav",
      "target_singer": "Bruno Mars",
      "key_shift_mode": 0,
      "pitch_shift_control": "Auto Shift",
      "diffusion_inference_steps": 1000
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

You can run this model locally using Cog. First, install Cog:

brew install cog

If you don’t have Homebrew, there are other installation options available.

Run this to download the model and run it in your local environment:

cog predict r8.im/lucataco/singing_voice_conversion@sha256:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b \
  -i 'source_audio="https://replicate.delivery/pbxt/K5cs0E3Ap10wLf1vuZKTwxx7kbw7Z2OJsmUJQRRsZb01Nzys/adele.wav"' \
  -i 'target_singer="Bruno Mars"' \
  -i 'key_shift_mode=0' \
  -i 'pitch_shift_control="Auto Shift"' \
  -i 'diffusion_inference_steps=1000'

To learn more, take a look at the Cog documentation.

Run this to download the model and run it in your local environment:

docker run -d -p 5000:5000 --gpus=all r8.im/lucataco/singing_voice_conversion@sha256:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b
curl -s -X POST \
  -H "Content-Type: application/json" \
  -d $'{
    "input": {
      "source_audio": "https://replicate.delivery/pbxt/K5cs0E3Ap10wLf1vuZKTwxx7kbw7Z2OJsmUJQRRsZb01Nzys/adele.wav",
      "target_singer": "Bruno Mars",
      "key_shift_mode": 0,
      "pitch_shift_control": "Auto Shift",
      "diffusion_inference_steps": 1000
    }
  }' \
  http://localhost:5000/predictions

To learn more, take a look at the Cog documentation.

Output

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

{
  "completed_at": "2023-12-21T22:39:27.615465Z",
  "created_at": "2023-12-21T22:39:04.141957Z",
  "data_removed": false,
  "error": null,
  "id": "sm5odfdb53siq7x4sduavb2qfu",
  "input": {
    "source_audio": "https://replicate.delivery/pbxt/K5cs0E3Ap10wLf1vuZKTwxx7kbw7Z2OJsmUJQRRsZb01Nzys/adele.wav",
    "target_singer": "Bruno Mars",
    "key_shift_mode": 0,
    "pitch_shift_control": "Auto Shift",
    "diffusion_inference_steps": 1000
  },
  "logs": "/tmp/input_audio\nvocalist_l1_BrunoMars\nautoshift\ngetopt: unrecognized option '--diffusion_inference_steps'\nExprimental Configuration File: ckpts/svc/vocalist_l1_contentvec+whisper/args.json\nThe following values were not passed to `accelerate launch` and had defaults used instead:\n`--num_processes` was set to a value of `1`\n`--num_machines` was set to a value of `1`\n`--mixed_precision` was set to a value of `'no'`\n`--dynamo_backend` was set to a value of `'no'`\nTo avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.\nMonotonic align not found. Please make sure you have compiled it.\nThere are 1 source audios:\n**********\nConversion for source...\nPrepare for meta eval data: 0.0s\n  0%|          | 0/1 [00:00<?, ?it/s]\n  0%|          | 0/1 [00:00<?, ?it/s]\u001b[A\n100%|██████████| 1/1 [00:00<00:00,  1.93it/s]\u001b[A\n100%|██████████| 1/1 [00:00<00:00,  1.93it/s]\nPrepare for acoustic features: 0.5s\nPrepare for content features: 0.0s\n2023-12-21 22:39:11 | INFO | inference | ========================================================\n2023-12-21 22:39:11 | INFO | inference | ||\t\tNew inference process started.\t\t||\n2023-12-21 22:39:11 | INFO | inference | ========================================================\n2023-12-21 22:39:11 | INFO | inference |\n2023-12-21 22:39:11 | DEBUG | inference | Using DEBUG logging level.\n2023-12-21 22:39:11 | DEBUG | inference | Acoustic dir: ckpts/svc/vocalist_l1_contentvec+whisper\n2023-12-21 22:39:11 | DEBUG | inference | Vocoder dir: pretrained/bigvgan\n2023-12-21 22:39:11 | DEBUG | inference | Setting random seed done in 0.77ms\n2023-12-21 22:39:11 | DEBUG | inference | Random seed: 10086\n2023-12-21 22:39:11 | INFO | inference | Building dataset...\n2023-12-21 22:39:11 | INFO | inference | Building dataset done in 4.40ms\n2023-12-21 22:39:11 | INFO | inference | Building model...\n2023-12-21 22:39:11 | INFO | inference | Building model done in 277.159ms\n2023-12-21 22:39:11 | INFO | inference | Initializing accelerate...\n2023-12-21 22:39:12 | INFO | inference | Initializing accelerate done in 1047.520ms\n2023-12-21 22:39:12 | INFO | inference | Loading checkpoint...\n2023-12-21 22:39:12 | INFO | accelerate.accelerator | Loading states from ckpts/svc/vocalist_l1_contentvec+whisper/checkpoint/epoch-6852_step-0678447_loss-1.946773\n2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All model weights loaded successfully\n2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All optimizer states loaded successfully\n2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All scheduler states loaded successfully\n2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All dataloader sampler states loaded successfully\n2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All random states loaded successfully\n2023-12-21 22:39:12 | INFO | accelerate.accelerator | Loading in 0 custom states\n2023-12-21 22:39:12 | INFO | inference | Loading checkpoint done in 121.116ms\n2023-12-21 22:39:12 | INFO | inference | Using PNDM scheduler.\nModel Init: 1.5s\nAuto transposing: source f0 median = 372.9, target f0 median = 324.7, factor = 0.87\n  0%|          | 0/1009 [00:00<?, ?it/s]\u001b[A\n  0%|          | 1/1009 [00:00<09:00,  1.87it/s]\u001b[A\n  2%|▏         | 20/1009 [00:00<00:23, 41.39it/s]\u001b[A\n  4%|▍         | 39/1009 [00:00<00:12, 74.81it/s]\u001b[A\n  6%|▌         | 58/1009 [00:00<00:09, 101.86it/s]\u001b[A\n  8%|▊         | 76/1009 [00:00<00:07, 121.91it/s]\u001b[A\n  9%|▉         | 95/1009 [00:01<00:06, 138.32it/s]\u001b[A\n 11%|█         | 113/1009 [00:01<00:05, 149.71it/s]\u001b[A\n 13%|█▎        | 132/1009 [00:01<00:05, 158.90it/s]\u001b[A\n 15%|█▍        | 151/1009 [00:01<00:05, 165.63it/s]\u001b[A\n 17%|█▋        | 169/1009 [00:01<00:04, 169.63it/s]\u001b[A\n 19%|█▊        | 188/1009 [00:01<00:04, 172.73it/s]\u001b[A\n 20%|██        | 206/1009 [00:01<00:04, 174.22it/s]\u001b[A\n 22%|██▏       | 224/1009 [00:01<00:04, 175.66it/s]\u001b[A\n 24%|██▍       | 242/1009 [00:01<00:04, 176.23it/s]\u001b[A\n 26%|██▌       | 261/1009 [00:01<00:04, 178.83it/s]\u001b[A\n 28%|██▊       | 281/1009 [00:02<00:03, 182.73it/s]\u001b[A\n 30%|██▉       | 301/1009 [00:02<00:03, 185.36it/s]\u001b[A\n 32%|███▏      | 321/1009 [00:02<00:03, 187.40it/s]\u001b[A\n 34%|███▍      | 341/1009 [00:02<00:03, 188.55it/s]\u001b[A\n 36%|███▌      | 360/1009 [00:02<00:03, 188.36it/s]\u001b[A\n 38%|███▊      | 379/1009 [00:02<00:03, 185.55it/s]\u001b[A\n 39%|███▉      | 398/1009 [00:02<00:03, 184.13it/s]\u001b[A\n 41%|████▏     | 417/1009 [00:02<00:03, 183.10it/s]\u001b[A\n 43%|████▎     | 436/1009 [00:02<00:03, 182.07it/s]\u001b[A\n 45%|████▌     | 455/1009 [00:03<00:03, 181.82it/s]\u001b[A\n 47%|████▋     | 474/1009 [00:03<00:02, 180.98it/s]\u001b[A\n 49%|████▉     | 493/1009 [00:03<00:02, 180.87it/s]\u001b[A\n 51%|█████     | 512/1009 [00:03<00:02, 180.94it/s]\u001b[A\n 53%|█████▎    | 531/1009 [00:03<00:02, 180.34it/s]\u001b[A\n 55%|█████▍    | 550/1009 [00:03<00:02, 180.29it/s]\u001b[A\n 56%|█████▋    | 569/1009 [00:03<00:02, 179.63it/s]\u001b[A\n 58%|█████▊    | 587/1009 [00:03<00:02, 179.69it/s]\u001b[A\n 60%|██████    | 606/1009 [00:03<00:02, 179.75it/s]\u001b[A\n 62%|██████▏   | 625/1009 [00:03<00:02, 179.88it/s]\u001b[A\n 64%|██████▍   | 644/1009 [00:04<00:02, 180.05it/s]\u001b[A\n 66%|██████▌   | 663/1009 [00:04<00:01, 180.42it/s]\u001b[A\n 68%|██████▊   | 682/1009 [00:04<00:01, 180.35it/s]\u001b[A\n 69%|██████▉   | 701/1009 [00:04<00:01, 180.59it/s]\u001b[A\n 71%|███████▏  | 720/1009 [00:04<00:01, 180.16it/s]\u001b[A\n 73%|███████▎  | 739/1009 [00:04<00:01, 180.36it/s]\u001b[A\n 75%|███████▌  | 758/1009 [00:04<00:01, 180.36it/s]\u001b[A\n 77%|███████▋  | 777/1009 [00:04<00:01, 180.32it/s]\u001b[A\n 79%|███████▉  | 796/1009 [00:04<00:01, 180.10it/s]\u001b[A\n 81%|████████  | 815/1009 [00:05<00:01, 179.96it/s]\u001b[A\n 83%|████████▎ | 834/1009 [00:05<00:00, 180.12it/s]\u001b[A\n 85%|████████▍ | 853/1009 [00:05<00:00, 179.92it/s]\u001b[A\n 86%|████████▋ | 871/1009 [00:05<00:00, 179.49it/s]\u001b[A\n 88%|████████▊ | 891/1009 [00:05<00:00, 182.84it/s]\u001b[A\n 90%|█████████ | 910/1009 [00:05<00:00, 182.37it/s]\u001b[A\n 92%|█████████▏| 929/1009 [00:05<00:00, 182.12it/s]\u001b[A\n 94%|█████████▍| 948/1009 [00:05<00:00, 181.90it/s]\u001b[A\n 96%|█████████▌| 967/1009 [00:05<00:00, 181.53it/s]\u001b[A\n 98%|█████████▊| 986/1009 [00:05<00:00, 181.55it/s]\u001b[A\n100%|█████████▉| 1005/1009 [00:06<00:00, 181.89it/s]\u001b[A\n100%|██████████| 1009/1009 [00:06<00:00, 165.81it/s]\nSynthesis audios using bigvgan vocoder...\nLoading Vocoder from Weights file: /src/Amphion/pretrained/bigvgan/400000.pt\nFor predicted mels, #sample = 1...\nModel inference: 12.6s\n100%|██████████| 1/1 [00:14<00:00, 14.64s/it]\n100%|██████████| 1/1 [00:14<00:00, 14.64s/it]\n/src/Amphion/result/source/source_vocalist_l1_BrunoMars.wav",
  "metrics": {
    "predict_time": 23.436477,
    "total_time": 23.473508
  },
  "output": "https://replicate.delivery/pbxt/rajESoNgjeV1KKUeuqErPMm7zxOQWBMWLpLXN5tiK9e9EOJkA/source_vocalist_l1_BrunoMars.wav",
  "started_at": "2023-12-21T22:39:04.178988Z",
  "status": "succeeded",
  "urls": {
    "get": "https://api.replicate.com/v1/predictions/sm5odfdb53siq7x4sduavb2qfu",
    "cancel": "https://api.replicate.com/v1/predictions/sm5odfdb53siq7x4sduavb2qfu/cancel"
  },
  "version": "f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b"
}

Generated in

23.4 seconds

Tweak it ShareReport

/tmp/input_audio
vocalist_l1_BrunoMars
autoshift
getopt: unrecognized option '--diffusion_inference_steps'
Exprimental Configuration File: ckpts/svc/vocalist_l1_contentvec+whisper/args.json
The following values were not passed to `accelerate launch` and had defaults used instead:
`--num_processes` was set to a value of `1`
`--num_machines` was set to a value of `1`
`--mixed_precision` was set to a value of `'no'`
`--dynamo_backend` was set to a value of `'no'`
To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.
Monotonic align not found. Please make sure you have compiled it.
There are 1 source audios:
**********
Conversion for source...
Prepare for meta eval data: 0.0s
  0%|          | 0/1 [00:00<?, ?it/s]
  0%|          | 0/1 [00:00<?, ?it/s]
100%|██████████| 1/1 [00:00<00:00,  1.93it/s]
100%|██████████| 1/1 [00:00<00:00,  1.93it/s]
Prepare for acoustic features: 0.5s
Prepare for content features: 0.0s
2023-12-21 22:39:11 | INFO | inference | ========================================================
2023-12-21 22:39:11 | INFO | inference | ||		New inference process started.		||
2023-12-21 22:39:11 | INFO | inference | ========================================================
2023-12-21 22:39:11 | INFO | inference |
2023-12-21 22:39:11 | DEBUG | inference | Using DEBUG logging level.
2023-12-21 22:39:11 | DEBUG | inference | Acoustic dir: ckpts/svc/vocalist_l1_contentvec+whisper
2023-12-21 22:39:11 | DEBUG | inference | Vocoder dir: pretrained/bigvgan
2023-12-21 22:39:11 | DEBUG | inference | Setting random seed done in 0.77ms
2023-12-21 22:39:11 | DEBUG | inference | Random seed: 10086
2023-12-21 22:39:11 | INFO | inference | Building dataset...
2023-12-21 22:39:11 | INFO | inference | Building dataset done in 4.40ms
2023-12-21 22:39:11 | INFO | inference | Building model...
2023-12-21 22:39:11 | INFO | inference | Building model done in 277.159ms
2023-12-21 22:39:11 | INFO | inference | Initializing accelerate...
2023-12-21 22:39:12 | INFO | inference | Initializing accelerate done in 1047.520ms
2023-12-21 22:39:12 | INFO | inference | Loading checkpoint...
2023-12-21 22:39:12 | INFO | accelerate.accelerator | Loading states from ckpts/svc/vocalist_l1_contentvec+whisper/checkpoint/epoch-6852_step-0678447_loss-1.946773
2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All model weights loaded successfully
2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All optimizer states loaded successfully
2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All scheduler states loaded successfully
2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All dataloader sampler states loaded successfully
2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All random states loaded successfully
2023-12-21 22:39:12 | INFO | accelerate.accelerator | Loading in 0 custom states
2023-12-21 22:39:12 | INFO | inference | Loading checkpoint done in 121.116ms
2023-12-21 22:39:12 | INFO | inference | Using PNDM scheduler.
Model Init: 1.5s
Auto transposing: source f0 median = 372.9, target f0 median = 324.7, factor = 0.87
  0%|          | 0/1009 [00:00<?, ?it/s]
  0%|          | 1/1009 [00:00<09:00,  1.87it/s]
  2%|▏         | 20/1009 [00:00<00:23, 41.39it/s]
  4%|▍         | 39/1009 [00:00<00:12, 74.81it/s]
  6%|▌         | 58/1009 [00:00<00:09, 101.86it/s]
  8%|▊         | 76/1009 [00:00<00:07, 121.91it/s]
  9%|▉         | 95/1009 [00:01<00:06, 138.32it/s]
 11%|█         | 113/1009 [00:01<00:05, 149.71it/s]
 13%|█▎        | 132/1009 [00:01<00:05, 158.90it/s]
 15%|█▍        | 151/1009 [00:01<00:05, 165.63it/s]
 17%|█▋        | 169/1009 [00:01<00:04, 169.63it/s]
 19%|█▊        | 188/1009 [00:01<00:04, 172.73it/s]
 20%|██        | 206/1009 [00:01<00:04, 174.22it/s]
 22%|██▏       | 224/1009 [00:01<00:04, 175.66it/s]
 24%|██▍       | 242/1009 [00:01<00:04, 176.23it/s]
 26%|██▌       | 261/1009 [00:01<00:04, 178.83it/s]
 28%|██▊       | 281/1009 [00:02<00:03, 182.73it/s]
 30%|██▉       | 301/1009 [00:02<00:03, 185.36it/s]
 32%|███▏      | 321/1009 [00:02<00:03, 187.40it/s]
 34%|███▍      | 341/1009 [00:02<00:03, 188.55it/s]
 36%|███▌      | 360/1009 [00:02<00:03, 188.36it/s]
 38%|███▊      | 379/1009 [00:02<00:03, 185.55it/s]
 39%|███▉      | 398/1009 [00:02<00:03, 184.13it/s]
 41%|████▏     | 417/1009 [00:02<00:03, 183.10it/s]
 43%|████▎     | 436/1009 [00:02<00:03, 182.07it/s]
 45%|████▌     | 455/1009 [00:03<00:03, 181.82it/s]
 47%|████▋     | 474/1009 [00:03<00:02, 180.98it/s]
 49%|████▉     | 493/1009 [00:03<00:02, 180.87it/s]
 51%|█████     | 512/1009 [00:03<00:02, 180.94it/s]
 53%|█████▎    | 531/1009 [00:03<00:02, 180.34it/s]
 55%|█████▍    | 550/1009 [00:03<00:02, 180.29it/s]
 56%|█████▋    | 569/1009 [00:03<00:02, 179.63it/s]
 58%|█████▊    | 587/1009 [00:03<00:02, 179.69it/s]
 60%|██████    | 606/1009 [00:03<00:02, 179.75it/s]
 62%|██████▏   | 625/1009 [00:03<00:02, 179.88it/s]
 64%|██████▍   | 644/1009 [00:04<00:02, 180.05it/s]
 66%|██████▌   | 663/1009 [00:04<00:01, 180.42it/s]
 68%|██████▊   | 682/1009 [00:04<00:01, 180.35it/s]
 69%|██████▉   | 701/1009 [00:04<00:01, 180.59it/s]
 71%|███████▏  | 720/1009 [00:04<00:01, 180.16it/s]
 73%|███████▎  | 739/1009 [00:04<00:01, 180.36it/s]
 75%|███████▌  | 758/1009 [00:04<00:01, 180.36it/s]
 77%|███████▋  | 777/1009 [00:04<00:01, 180.32it/s]
 79%|███████▉  | 796/1009 [00:04<00:01, 180.10it/s]
 81%|████████  | 815/1009 [00:05<00:01, 179.96it/s]
 83%|████████▎ | 834/1009 [00:05<00:00, 180.12it/s]
 85%|████████▍ | 853/1009 [00:05<00:00, 179.92it/s]
 86%|████████▋ | 871/1009 [00:05<00:00, 179.49it/s]
 88%|████████▊ | 891/1009 [00:05<00:00, 182.84it/s]
 90%|█████████ | 910/1009 [00:05<00:00, 182.37it/s]
 92%|█████████▏| 929/1009 [00:05<00:00, 182.12it/s]
 94%|█████████▍| 948/1009 [00:05<00:00, 181.90it/s]
 96%|█████████▌| 967/1009 [00:05<00:00, 181.53it/s]
 98%|█████████▊| 986/1009 [00:05<00:00, 181.55it/s]
100%|█████████▉| 1005/1009 [00:06<00:00, 181.89it/s]
100%|██████████| 1009/1009 [00:06<00:00, 165.81it/s]
Synthesis audios using bigvgan vocoder...
Loading Vocoder from Weights file: /src/Amphion/pretrained/bigvgan/400000.pt
For predicted mels, #sample = 1...
Model inference: 12.6s
100%|██████████| 1/1 [00:14<00:00, 14.64s/it]
100%|██████████| 1/1 [00:14<00:00, 14.64s/it]
/src/Amphion/result/source/source_vocalist_l1_BrunoMars.wav