lucataco / singing_voice_conversion
Amphion Singing Voice Conversion: DiffWaveNetSVC
Prediction
lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2bIDh37sr5dbojsspt56c34pvjanoeStatusSucceededSourceWebHardwareA40 (Large)Total durationCreatedInput
- source_audio
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
- target_singer
- Taylor Swift
- key_shift_mode
- 0
- pitch_shift_control
- Auto Shift
- diffusion_inference_steps
- 1000
{ "source_audio": "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav", "target_singer": "Taylor Swift", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b", { input: { source_audio: "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav", target_singer: "Taylor Swift", key_shift_mode: 0, pitch_shift_control: "Auto Shift", diffusion_inference_steps: 1000 } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b", input={ "source_audio": "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav", "target_singer": "Taylor Swift", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b", "input": { "source_audio": "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav", "target_singer": "Taylor Swift", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
You can run this model locally using Cog. First, install Cog:brew install cog
If you don’t have Homebrew, there are other installation options available.
Run this to download the model and run it in your local environment:
cog predict r8.im/lucataco/singing_voice_conversion@sha256:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b \ -i 'source_audio="https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav"' \ -i 'target_singer="Taylor Swift"' \ -i 'key_shift_mode=0' \ -i 'pitch_shift_control="Auto Shift"' \ -i 'diffusion_inference_steps=1000'
To learn more, take a look at the Cog documentation.
Run this to download the model and run it in your local environment:
docker run -d -p 5000:5000 --gpus=all r8.im/lucataco/singing_voice_conversion@sha256:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b
curl -s -X POST \ -H "Content-Type: application/json" \ -d $'{ "input": { "source_audio": "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav", "target_singer": "Taylor Swift", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 } }' \ http://localhost:5000/predictions
To learn more, take a look at the Cog documentation.
Output
Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "completed_at": "2023-12-21T22:37:49.201677Z", "created_at": "2023-12-21T22:35:12.670504Z", "data_removed": false, "error": null, "id": "h37sr5dbojsspt56c34pvjanoe", "input": { "source_audio": "https://replicate.delivery/pbxt/K5coMzCs7mnhljhRVhdhN29I3RlHPkneVxrbPtyArzxvAVtI/adele.wav", "target_singer": "Taylor Swift", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 }, "logs": "/tmp/input_audio\nvocalist_l1_TaylorSwift\nautoshift\ngetopt: unrecognized option '--diffusion_inference_steps'\nExprimental Configuration File: ckpts/svc/vocalist_l1_contentvec+whisper/args.json\nThe following values were not passed to `accelerate launch` and had defaults used instead:\n`--num_processes` was set to a value of `1`\n`--num_machines` was set to a value of `1`\n`--mixed_precision` was set to a value of `'no'`\n`--dynamo_backend` was set to a value of `'no'`\nTo avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.\nMonotonic align not found. Please make sure you have compiled it.\nThere are 1 source audios:\n**********\nConversion for source...\nPrepare for meta eval data: 0.0s\n 0%| | 0/1 [00:00<?, ?it/s]\n 0%| | 0/1 [00:00<?, ?it/s]\u001b[A\n100%|██████████| 1/1 [00:01<00:00, 1.98s/it]\u001b[A\n100%|██████████| 1/1 [00:01<00:00, 1.98s/it]\nPrepare for acoustic features: 2.0s\nPrepare for content features: 0.0s\n2023-12-21 22:37:31 | INFO | inference | ========================================================\n2023-12-21 22:37:31 | INFO | inference | ||\t\tNew inference process started.\t\t||\n2023-12-21 22:37:31 | INFO | inference | ========================================================\n2023-12-21 22:37:31 | INFO | inference |\n2023-12-21 22:37:31 | DEBUG | inference | Using DEBUG logging level.\n2023-12-21 22:37:31 | DEBUG | inference | Acoustic dir: ckpts/svc/vocalist_l1_contentvec+whisper\n2023-12-21 22:37:31 | DEBUG | inference | Vocoder dir: pretrained/bigvgan\n2023-12-21 22:37:31 | DEBUG | inference | Setting random seed done in 0.83ms\n2023-12-21 22:37:31 | DEBUG | inference | Random seed: 10086\n2023-12-21 22:37:31 | INFO | inference | Building dataset...\n2023-12-21 22:37:31 | INFO | inference | Building dataset done in 4.60ms\n2023-12-21 22:37:31 | INFO | inference | Building model...\n2023-12-21 22:37:31 | INFO | inference | Building model done in 276.183ms\n2023-12-21 22:37:31 | INFO | inference | Initializing accelerate...\n2023-12-21 22:37:32 | INFO | inference | Initializing accelerate done in 1057.268ms\n2023-12-21 22:37:32 | INFO | inference | Loading checkpoint...\n2023-12-21 22:37:32 | INFO | accelerate.accelerator | Loading states from ckpts/svc/vocalist_l1_contentvec+whisper/checkpoint/epoch-6852_step-0678447_loss-1.946773\n2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All model weights loaded successfully\n2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All optimizer states loaded successfully\n2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All scheduler states loaded successfully\n2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All dataloader sampler states loaded successfully\n2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All random states loaded successfully\n2023-12-21 22:37:32 | INFO | accelerate.accelerator | Loading in 0 custom states\n2023-12-21 22:37:32 | INFO | inference | Loading checkpoint done in 106.015ms\n2023-12-21 22:37:32 | INFO | inference | Using PNDM scheduler.\nModel Init: 1.5s\nAuto transposing: source f0 median = 372.9, target f0 median = 286.9, factor = 0.77\n 0%| | 0/1009 [00:00<?, ?it/s]\u001b[A\n 0%| | 1/1009 [00:02<39:02, 2.32s/it]\u001b[A\n 2%|▏ | 20/1009 [00:02<01:26, 11.39it/s]\u001b[A\n 4%|▍ | 39/1009 [00:02<00:38, 25.02it/s]\u001b[A\n 6%|▌ | 58/1009 [00:02<00:23, 41.23it/s]\u001b[A\n 8%|▊ | 77/1009 [00:02<00:15, 59.36it/s]\u001b[A\n 10%|▉ | 96/1009 [00:02<00:11, 78.69it/s]\u001b[A\n 11%|█▏ | 115/1009 [00:02<00:09, 98.15it/s]\u001b[A\n 13%|█▎ | 134/1009 [00:03<00:07, 116.43it/s]\u001b[A\n 15%|█▌ | 153/1009 [00:03<00:06, 132.68it/s]\u001b[A\n 17%|█▋ | 172/1009 [00:03<00:05, 146.29it/s]\u001b[A\n 19%|█▉ | 191/1009 [00:03<00:05, 157.09it/s]\u001b[A\n 21%|██ | 210/1009 [00:03<00:04, 164.64it/s]\u001b[A\n 23%|██▎ | 229/1009 [00:03<00:04, 170.24it/s]\u001b[A\n 25%|██▍ | 248/1009 [00:03<00:04, 174.54it/s]\u001b[A\n 26%|██▋ | 267/1009 [00:03<00:04, 176.83it/s]\u001b[A\n 28%|██▊ | 286/1009 [00:03<00:04, 178.76it/s]\u001b[A\n 30%|███ | 305/1009 [00:03<00:03, 180.21it/s]\u001b[A\n 32%|███▏ | 324/1009 [00:04<00:03, 179.83it/s]\u001b[A\n 34%|███▍ | 343/1009 [00:04<00:03, 179.98it/s]\u001b[A\n 36%|███▌ | 362/1009 [00:04<00:03, 181.63it/s]\u001b[A\n 38%|███▊ | 381/1009 [00:04<00:03, 181.04it/s]\u001b[A\n 40%|███▉ | 400/1009 [00:04<00:03, 182.13it/s]\u001b[A\n 42%|████▏ | 419/1009 [00:04<00:03, 182.37it/s]\u001b[A\n 43%|████▎ | 438/1009 [00:04<00:03, 183.85it/s]\u001b[A\n 45%|████▌ | 457/1009 [00:04<00:02, 184.98it/s]\u001b[A\n 47%|████▋ | 476/1009 [00:04<00:02, 185.91it/s]\u001b[A\n 49%|████▉ | 495/1009 [00:04<00:02, 186.25it/s]\u001b[A\n 51%|█████ | 514/1009 [00:05<00:02, 187.04it/s]\u001b[A\n 53%|█████▎ | 533/1009 [00:05<00:02, 187.69it/s]\u001b[A\n 55%|█████▍ | 552/1009 [00:05<00:02, 188.32it/s]\u001b[A\n 57%|█████▋ | 571/1009 [00:05<00:02, 188.13it/s]\u001b[A\n 58%|█████▊ | 590/1009 [00:05<00:02, 188.37it/s]\u001b[A\n 60%|██████ | 609/1009 [00:05<00:02, 188.66it/s]\u001b[A\n 62%|██████▏ | 628/1009 [00:05<00:02, 188.88it/s]\u001b[A\n 64%|██████▍ | 647/1009 [00:05<00:01, 188.97it/s]\u001b[A\n 66%|██████▌ | 666/1009 [00:05<00:01, 188.77it/s]\u001b[A\n 68%|██████▊ | 685/1009 [00:05<00:01, 188.38it/s]\u001b[A\n 70%|██████▉ | 704/1009 [00:06<00:01, 188.63it/s]\u001b[A\n 72%|███████▏ | 723/1009 [00:06<00:01, 188.84it/s]\u001b[A\n 74%|███████▎ | 742/1009 [00:06<00:01, 189.15it/s]\u001b[A\n 75%|███████▌ | 761/1009 [00:06<00:01, 188.98it/s]\u001b[A\n 77%|███████▋ | 780/1009 [00:06<00:01, 189.16it/s]\u001b[A\n 79%|███████▉ | 799/1009 [00:06<00:01, 186.62it/s]\u001b[A\n 81%|████████ | 819/1009 [00:06<00:01, 188.31it/s]\u001b[A\n 83%|████████▎ | 838/1009 [00:06<00:00, 185.22it/s]\u001b[A\n 85%|████████▍ | 857/1009 [00:06<00:00, 186.46it/s]\u001b[A\n 87%|████████▋ | 877/1009 [00:07<00:00, 188.15it/s]\u001b[A\n 89%|████████▉ | 897/1009 [00:07<00:00, 188.85it/s]\u001b[A\n 91%|█████████ | 917/1009 [00:07<00:00, 189.80it/s]\u001b[A\n 93%|█████████▎| 937/1009 [00:07<00:00, 190.56it/s]\u001b[A\n 95%|█████████▍| 957/1009 [00:07<00:00, 190.87it/s]\u001b[A\n 97%|█████████▋| 977/1009 [00:07<00:00, 191.29it/s]\u001b[A\n 99%|█████████▉| 997/1009 [00:07<00:00, 190.46it/s]\u001b[A\n100%|██████████| 1009/1009 [00:07<00:00, 130.99it/s]\nSynthesis audios using bigvgan vocoder...\nLoading Vocoder from Weights file: /src/Amphion/pretrained/bigvgan/400000.pt\nFor predicted mels, #sample = 1...\nModel inference: 14.1s\n100%|██████████| 1/1 [00:17<00:00, 17.56s/it]\n100%|██████████| 1/1 [00:17<00:00, 17.56s/it]\n/src/Amphion/result/source/source_vocalist_l1_TaylorSwift.wav", "metrics": { "predict_time": 31.512607, "total_time": 156.531173 }, "output": "https://replicate.delivery/pbxt/FoHobqVw0mrPOluLgRQGEW01GwDo5tvSefKNDyc79hY8AnESA/source_vocalist_l1_TaylorSwift.wav", "started_at": "2023-12-21T22:37:17.689070Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/h37sr5dbojsspt56c34pvjanoe", "cancel": "https://api.replicate.com/v1/predictions/h37sr5dbojsspt56c34pvjanoe/cancel" }, "version": "f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b" }
Generated in/tmp/input_audio vocalist_l1_TaylorSwift autoshift getopt: unrecognized option '--diffusion_inference_steps' Exprimental Configuration File: ckpts/svc/vocalist_l1_contentvec+whisper/args.json The following values were not passed to `accelerate launch` and had defaults used instead: `--num_processes` was set to a value of `1` `--num_machines` was set to a value of `1` `--mixed_precision` was set to a value of `'no'` `--dynamo_backend` was set to a value of `'no'` To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. Monotonic align not found. Please make sure you have compiled it. There are 1 source audios: ********** Conversion for source... Prepare for meta eval data: 0.0s 0%| | 0/1 [00:00<?, ?it/s] 0%| | 0/1 [00:00<?, ?it/s] 100%|██████████| 1/1 [00:01<00:00, 1.98s/it] 100%|██████████| 1/1 [00:01<00:00, 1.98s/it] Prepare for acoustic features: 2.0s Prepare for content features: 0.0s 2023-12-21 22:37:31 | INFO | inference | ======================================================== 2023-12-21 22:37:31 | INFO | inference | || New inference process started. || 2023-12-21 22:37:31 | INFO | inference | ======================================================== 2023-12-21 22:37:31 | INFO | inference | 2023-12-21 22:37:31 | DEBUG | inference | Using DEBUG logging level. 2023-12-21 22:37:31 | DEBUG | inference | Acoustic dir: ckpts/svc/vocalist_l1_contentvec+whisper 2023-12-21 22:37:31 | DEBUG | inference | Vocoder dir: pretrained/bigvgan 2023-12-21 22:37:31 | DEBUG | inference | Setting random seed done in 0.83ms 2023-12-21 22:37:31 | DEBUG | inference | Random seed: 10086 2023-12-21 22:37:31 | INFO | inference | Building dataset... 2023-12-21 22:37:31 | INFO | inference | Building dataset done in 4.60ms 2023-12-21 22:37:31 | INFO | inference | Building model... 2023-12-21 22:37:31 | INFO | inference | Building model done in 276.183ms 2023-12-21 22:37:31 | INFO | inference | Initializing accelerate... 2023-12-21 22:37:32 | INFO | inference | Initializing accelerate done in 1057.268ms 2023-12-21 22:37:32 | INFO | inference | Loading checkpoint... 2023-12-21 22:37:32 | INFO | accelerate.accelerator | Loading states from ckpts/svc/vocalist_l1_contentvec+whisper/checkpoint/epoch-6852_step-0678447_loss-1.946773 2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All model weights loaded successfully 2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All optimizer states loaded successfully 2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All scheduler states loaded successfully 2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All dataloader sampler states loaded successfully 2023-12-21 22:37:32 | INFO | accelerate.checkpointing | All random states loaded successfully 2023-12-21 22:37:32 | INFO | accelerate.accelerator | Loading in 0 custom states 2023-12-21 22:37:32 | INFO | inference | Loading checkpoint done in 106.015ms 2023-12-21 22:37:32 | INFO | inference | Using PNDM scheduler. Model Init: 1.5s Auto transposing: source f0 median = 372.9, target f0 median = 286.9, factor = 0.77 0%| | 0/1009 [00:00<?, ?it/s] 0%| | 1/1009 [00:02<39:02, 2.32s/it] 2%|▏ | 20/1009 [00:02<01:26, 11.39it/s] 4%|▍ | 39/1009 [00:02<00:38, 25.02it/s] 6%|▌ | 58/1009 [00:02<00:23, 41.23it/s] 8%|▊ | 77/1009 [00:02<00:15, 59.36it/s] 10%|▉ | 96/1009 [00:02<00:11, 78.69it/s] 11%|█▏ | 115/1009 [00:02<00:09, 98.15it/s] 13%|█▎ | 134/1009 [00:03<00:07, 116.43it/s] 15%|█▌ | 153/1009 [00:03<00:06, 132.68it/s] 17%|█▋ | 172/1009 [00:03<00:05, 146.29it/s] 19%|█▉ | 191/1009 [00:03<00:05, 157.09it/s] 21%|██ | 210/1009 [00:03<00:04, 164.64it/s] 23%|██▎ | 229/1009 [00:03<00:04, 170.24it/s] 25%|██▍ | 248/1009 [00:03<00:04, 174.54it/s] 26%|██▋ | 267/1009 [00:03<00:04, 176.83it/s] 28%|██▊ | 286/1009 [00:03<00:04, 178.76it/s] 30%|███ | 305/1009 [00:03<00:03, 180.21it/s] 32%|███▏ | 324/1009 [00:04<00:03, 179.83it/s] 34%|███▍ | 343/1009 [00:04<00:03, 179.98it/s] 36%|███▌ | 362/1009 [00:04<00:03, 181.63it/s] 38%|███▊ | 381/1009 [00:04<00:03, 181.04it/s] 40%|███▉ | 400/1009 [00:04<00:03, 182.13it/s] 42%|████▏ | 419/1009 [00:04<00:03, 182.37it/s] 43%|████▎ | 438/1009 [00:04<00:03, 183.85it/s] 45%|████▌ | 457/1009 [00:04<00:02, 184.98it/s] 47%|████▋ | 476/1009 [00:04<00:02, 185.91it/s] 49%|████▉ | 495/1009 [00:04<00:02, 186.25it/s] 51%|█████ | 514/1009 [00:05<00:02, 187.04it/s] 53%|█████▎ | 533/1009 [00:05<00:02, 187.69it/s] 55%|█████▍ | 552/1009 [00:05<00:02, 188.32it/s] 57%|█████▋ | 571/1009 [00:05<00:02, 188.13it/s] 58%|█████▊ | 590/1009 [00:05<00:02, 188.37it/s] 60%|██████ | 609/1009 [00:05<00:02, 188.66it/s] 62%|██████▏ | 628/1009 [00:05<00:02, 188.88it/s] 64%|██████▍ | 647/1009 [00:05<00:01, 188.97it/s] 66%|██████▌ | 666/1009 [00:05<00:01, 188.77it/s] 68%|██████▊ | 685/1009 [00:05<00:01, 188.38it/s] 70%|██████▉ | 704/1009 [00:06<00:01, 188.63it/s] 72%|███████▏ | 723/1009 [00:06<00:01, 188.84it/s] 74%|███████▎ | 742/1009 [00:06<00:01, 189.15it/s] 75%|███████▌ | 761/1009 [00:06<00:01, 188.98it/s] 77%|███████▋ | 780/1009 [00:06<00:01, 189.16it/s] 79%|███████▉ | 799/1009 [00:06<00:01, 186.62it/s] 81%|████████ | 819/1009 [00:06<00:01, 188.31it/s] 83%|████████▎ | 838/1009 [00:06<00:00, 185.22it/s] 85%|████████▍ | 857/1009 [00:06<00:00, 186.46it/s] 87%|████████▋ | 877/1009 [00:07<00:00, 188.15it/s] 89%|████████▉ | 897/1009 [00:07<00:00, 188.85it/s] 91%|█████████ | 917/1009 [00:07<00:00, 189.80it/s] 93%|█████████▎| 937/1009 [00:07<00:00, 190.56it/s] 95%|█████████▍| 957/1009 [00:07<00:00, 190.87it/s] 97%|█████████▋| 977/1009 [00:07<00:00, 191.29it/s] 99%|█████████▉| 997/1009 [00:07<00:00, 190.46it/s] 100%|██████████| 1009/1009 [00:07<00:00, 130.99it/s] Synthesis audios using bigvgan vocoder... Loading Vocoder from Weights file: /src/Amphion/pretrained/bigvgan/400000.pt For predicted mels, #sample = 1... Model inference: 14.1s 100%|██████████| 1/1 [00:17<00:00, 17.56s/it] 100%|██████████| 1/1 [00:17<00:00, 17.56s/it] /src/Amphion/result/source/source_vocalist_l1_TaylorSwift.wav
Prediction
lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2bIDujqoollbrkifg2tmynr4rl2xuuStatusSucceededSourceWebHardwareA40 (Large)Total durationCreatedInput
- source_audio
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
- target_singer
- Beyonce
- key_shift_mode
- 0
- pitch_shift_control
- Auto Shift
- diffusion_inference_steps
- 1000
{ "source_audio": "https://replicate.delivery/pbxt/K5cr2RsQxEgBXEYincFUWhma8aGy0qWvh3vSFXjv9iqKh0wM/adele.wav", "target_singer": "Beyonce", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b", { input: { source_audio: "https://replicate.delivery/pbxt/K5cr2RsQxEgBXEYincFUWhma8aGy0qWvh3vSFXjv9iqKh0wM/adele.wav", target_singer: "Beyonce", key_shift_mode: 0, pitch_shift_control: "Auto Shift", diffusion_inference_steps: 1000 } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b", input={ "source_audio": "https://replicate.delivery/pbxt/K5cr2RsQxEgBXEYincFUWhma8aGy0qWvh3vSFXjv9iqKh0wM/adele.wav", "target_singer": "Beyonce", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b", "input": { "source_audio": "https://replicate.delivery/pbxt/K5cr2RsQxEgBXEYincFUWhma8aGy0qWvh3vSFXjv9iqKh0wM/adele.wav", "target_singer": "Beyonce", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
You can run this model locally using Cog. First, install Cog:brew install cog
If you don’t have Homebrew, there are other installation options available.
Run this to download the model and run it in your local environment:
cog predict r8.im/lucataco/singing_voice_conversion@sha256:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b \ -i 'source_audio="https://replicate.delivery/pbxt/K5cr2RsQxEgBXEYincFUWhma8aGy0qWvh3vSFXjv9iqKh0wM/adele.wav"' \ -i 'target_singer="Beyonce"' \ -i 'key_shift_mode=0' \ -i 'pitch_shift_control="Auto Shift"' \ -i 'diffusion_inference_steps=1000'
To learn more, take a look at the Cog documentation.
Run this to download the model and run it in your local environment:
docker run -d -p 5000:5000 --gpus=all r8.im/lucataco/singing_voice_conversion@sha256:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b
curl -s -X POST \ -H "Content-Type: application/json" \ -d $'{ "input": { "source_audio": "https://replicate.delivery/pbxt/K5cr2RsQxEgBXEYincFUWhma8aGy0qWvh3vSFXjv9iqKh0wM/adele.wav", "target_singer": "Beyonce", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 } }' \ http://localhost:5000/predictions
To learn more, take a look at the Cog documentation.
Output
Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "completed_at": "2023-12-21T22:38:31.138680Z", "created_at": "2023-12-21T22:38:01.978638Z", "data_removed": false, "error": null, "id": "ujqoollbrkifg2tmynr4rl2xuu", "input": { "source_audio": "https://replicate.delivery/pbxt/K5cr2RsQxEgBXEYincFUWhma8aGy0qWvh3vSFXjv9iqKh0wM/adele.wav", "target_singer": "Beyonce", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 }, "logs": "/tmp/input_audio\nvocalist_l1_Beyonce\nautoshift\ngetopt: unrecognized option '--diffusion_inference_steps'\nExprimental Configuration File: ckpts/svc/vocalist_l1_contentvec+whisper/args.json\nThe following values were not passed to `accelerate launch` and had defaults used instead:\n`--num_processes` was set to a value of `1`\n`--num_machines` was set to a value of `1`\n`--mixed_precision` was set to a value of `'no'`\n`--dynamo_backend` was set to a value of `'no'`\nTo avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.\nMonotonic align not found. Please make sure you have compiled it.\nThere are 1 source audios:\n**********\nConversion for source...\nPrepare for meta eval data: 0.0s\n 0%| | 0/1 [00:00<?, ?it/s]\n 0%| | 0/1 [00:00<?, ?it/s]\u001b[A\n100%|██████████| 1/1 [00:00<00:00, 1.93it/s]\u001b[A\n100%|██████████| 1/1 [00:00<00:00, 1.93it/s]\nPrepare for acoustic features: 0.5s\nPrepare for content features: 0.0s\n2023-12-21 22:38:14 | INFO | inference | ========================================================\n2023-12-21 22:38:14 | INFO | inference | ||\t\tNew inference process started.\t\t||\n2023-12-21 22:38:14 | INFO | inference | ========================================================\n2023-12-21 22:38:14 | INFO | inference |\n2023-12-21 22:38:14 | DEBUG | inference | Using DEBUG logging level.\n2023-12-21 22:38:14 | DEBUG | inference | Acoustic dir: ckpts/svc/vocalist_l1_contentvec+whisper\n2023-12-21 22:38:14 | DEBUG | inference | Vocoder dir: pretrained/bigvgan\n2023-12-21 22:38:14 | DEBUG | inference | Setting random seed done in 0.82ms\n2023-12-21 22:38:14 | DEBUG | inference | Random seed: 10086\n2023-12-21 22:38:14 | INFO | inference | Building dataset...\n2023-12-21 22:38:14 | INFO | inference | Building dataset done in 4.43ms\n2023-12-21 22:38:14 | INFO | inference | Building model...\n2023-12-21 22:38:14 | INFO | inference | Building model done in 275.277ms\n2023-12-21 22:38:14 | INFO | inference | Initializing accelerate...\n2023-12-21 22:38:15 | INFO | inference | Initializing accelerate done in 1093.010ms\n2023-12-21 22:38:15 | INFO | inference | Loading checkpoint...\n2023-12-21 22:38:15 | INFO | accelerate.accelerator | Loading states from ckpts/svc/vocalist_l1_contentvec+whisper/checkpoint/epoch-6852_step-0678447_loss-1.946773\n2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All model weights loaded successfully\n2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All optimizer states loaded successfully\n2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All scheduler states loaded successfully\n2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All dataloader sampler states loaded successfully\n2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All random states loaded successfully\n2023-12-21 22:38:15 | INFO | accelerate.accelerator | Loading in 0 custom states\n2023-12-21 22:38:15 | INFO | inference | Loading checkpoint done in 99.938ms\n2023-12-21 22:38:15 | INFO | inference | Using PNDM scheduler.\nModel Init: 1.5s\nAuto transposing: source f0 median = 372.9, target f0 median = 318.3, factor = 0.85\n 0%| | 0/1009 [00:00<?, ?it/s]\u001b[A\n 0%| | 1/1009 [00:00<10:21, 1.62it/s]\u001b[A\n 2%|▏ | 20/1009 [00:00<00:26, 36.95it/s]\u001b[A\n 4%|▍ | 39/1009 [00:00<00:14, 69.13it/s]\u001b[A\n 6%|▌ | 58/1009 [00:00<00:09, 96.78it/s]\u001b[A\n 8%|▊ | 77/1009 [00:01<00:07, 118.74it/s]\u001b[A\n 10%|▉ | 96/1009 [00:01<00:06, 136.25it/s]\u001b[A\n 11%|█▏ | 115/1009 [00:01<00:05, 150.28it/s]\u001b[A\n 13%|█▎ | 135/1009 [00:01<00:05, 161.83it/s]\u001b[A\n 15%|█▌ | 155/1009 [00:01<00:05, 170.42it/s]\u001b[A\n 17%|█▋ | 175/1009 [00:01<00:04, 176.38it/s]\u001b[A\n 19%|█▉ | 195/1009 [00:01<00:04, 180.68it/s]\u001b[A\n 21%|██ | 214/1009 [00:01<00:04, 183.11it/s]\u001b[A\n 23%|██▎ | 233/1009 [00:01<00:04, 184.90it/s]\u001b[A\n 25%|██▌ | 253/1009 [00:01<00:04, 186.84it/s]\u001b[A\n 27%|██▋ | 272/1009 [00:02<00:03, 187.40it/s]\u001b[A\n 29%|██▉ | 292/1009 [00:02<00:03, 188.43it/s]\u001b[A\n 31%|███ | 311/1009 [00:02<00:03, 188.83it/s]\u001b[A\n 33%|███▎ | 331/1009 [00:02<00:03, 189.21it/s]\u001b[A\n 35%|███▍ | 351/1009 [00:02<00:03, 189.52it/s]\u001b[A\n 37%|███▋ | 371/1009 [00:02<00:03, 189.85it/s]\u001b[A\n 39%|███▉ | 391/1009 [00:02<00:03, 190.24it/s]\u001b[A\n 41%|████ | 411/1009 [00:02<00:03, 190.48it/s]\u001b[A\n 43%|████▎ | 431/1009 [00:02<00:03, 190.21it/s]\u001b[A\n 45%|████▍ | 451/1009 [00:02<00:02, 190.62it/s]\u001b[A\n 47%|████▋ | 471/1009 [00:03<00:02, 190.39it/s]\u001b[A\n 49%|████▊ | 491/1009 [00:03<00:02, 190.36it/s]\u001b[A\n 51%|█████ | 511/1009 [00:03<00:02, 189.68it/s]\u001b[A\n 53%|█████▎ | 530/1009 [00:03<00:02, 186.46it/s]\u001b[A\n 54%|█████▍ | 549/1009 [00:03<00:02, 184.33it/s]\u001b[A\n 56%|█████▋ | 568/1009 [00:03<00:02, 183.14it/s]\u001b[A\n 58%|█████▊ | 587/1009 [00:03<00:02, 181.72it/s]\u001b[A\n 60%|██████ | 606/1009 [00:03<00:02, 181.17it/s]\u001b[A\n 62%|██████▏ | 625/1009 [00:03<00:02, 181.36it/s]\u001b[A\n 64%|██████▍ | 644/1009 [00:04<00:02, 181.32it/s]\u001b[A\n 66%|██████▌ | 663/1009 [00:04<00:01, 181.26it/s]\u001b[A\n 68%|██████▊ | 682/1009 [00:04<00:01, 181.11it/s]\u001b[A\n 69%|██████▉ | 701/1009 [00:04<00:01, 181.30it/s]\u001b[A\n 71%|███████▏ | 720/1009 [00:04<00:01, 181.46it/s]\u001b[A\n 73%|███████▎ | 739/1009 [00:04<00:01, 181.68it/s]\u001b[A\n 75%|███████▌ | 759/1009 [00:04<00:01, 184.70it/s]\u001b[A\n 77%|███████▋ | 779/1009 [00:04<00:01, 186.43it/s]\u001b[A\n 79%|███████▉ | 799/1009 [00:04<00:01, 187.97it/s]\u001b[A\n 81%|████████ | 818/1009 [00:04<00:01, 185.71it/s]\u001b[A\n 83%|████████▎ | 837/1009 [00:05<00:00, 184.44it/s]\u001b[A\n 85%|████████▍ | 856/1009 [00:05<00:00, 183.59it/s]\u001b[A\n 87%|████████▋ | 875/1009 [00:05<00:00, 182.91it/s]\u001b[A\n 89%|████████▊ | 894/1009 [00:05<00:00, 181.95it/s]\u001b[A\n 91%|█████████ | 914/1009 [00:05<00:00, 184.87it/s]\u001b[A\n 93%|█████████▎| 934/1009 [00:05<00:00, 186.66it/s]\u001b[A\n 94%|█████████▍| 953/1009 [00:05<00:00, 186.56it/s]\u001b[A\n 96%|█████████▋| 972/1009 [00:05<00:00, 183.50it/s]\u001b[A\n 98%|█████████▊| 991/1009 [00:05<00:00, 182.73it/s]\u001b[A\n100%|██████████| 1009/1009 [00:06<00:00, 167.18it/s]\nSynthesis audios using bigvgan vocoder...\nLoading Vocoder from Weights file: /src/Amphion/pretrained/bigvgan/400000.pt\nFor predicted mels, #sample = 1...\nModel inference: 12.8s\n100%|██████████| 1/1 [00:14<00:00, 14.84s/it]\n100%|██████████| 1/1 [00:14<00:00, 14.84s/it]\n/src/Amphion/result/source/source_vocalist_l1_Beyonce.wav", "metrics": { "predict_time": 29.123041, "total_time": 29.160042 }, "output": "https://replicate.delivery/pbxt/Au7w7BH5kx4VA1j6zZLcGBllIre31A0uw80Cc0Bk0kczgTCJA/source_vocalist_l1_Beyonce.wav", "started_at": "2023-12-21T22:38:02.015639Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/ujqoollbrkifg2tmynr4rl2xuu", "cancel": "https://api.replicate.com/v1/predictions/ujqoollbrkifg2tmynr4rl2xuu/cancel" }, "version": "f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b" }
Generated in/tmp/input_audio vocalist_l1_Beyonce autoshift getopt: unrecognized option '--diffusion_inference_steps' Exprimental Configuration File: ckpts/svc/vocalist_l1_contentvec+whisper/args.json The following values were not passed to `accelerate launch` and had defaults used instead: `--num_processes` was set to a value of `1` `--num_machines` was set to a value of `1` `--mixed_precision` was set to a value of `'no'` `--dynamo_backend` was set to a value of `'no'` To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. Monotonic align not found. Please make sure you have compiled it. There are 1 source audios: ********** Conversion for source... Prepare for meta eval data: 0.0s 0%| | 0/1 [00:00<?, ?it/s] 0%| | 0/1 [00:00<?, ?it/s] 100%|██████████| 1/1 [00:00<00:00, 1.93it/s] 100%|██████████| 1/1 [00:00<00:00, 1.93it/s] Prepare for acoustic features: 0.5s Prepare for content features: 0.0s 2023-12-21 22:38:14 | INFO | inference | ======================================================== 2023-12-21 22:38:14 | INFO | inference | || New inference process started. || 2023-12-21 22:38:14 | INFO | inference | ======================================================== 2023-12-21 22:38:14 | INFO | inference | 2023-12-21 22:38:14 | DEBUG | inference | Using DEBUG logging level. 2023-12-21 22:38:14 | DEBUG | inference | Acoustic dir: ckpts/svc/vocalist_l1_contentvec+whisper 2023-12-21 22:38:14 | DEBUG | inference | Vocoder dir: pretrained/bigvgan 2023-12-21 22:38:14 | DEBUG | inference | Setting random seed done in 0.82ms 2023-12-21 22:38:14 | DEBUG | inference | Random seed: 10086 2023-12-21 22:38:14 | INFO | inference | Building dataset... 2023-12-21 22:38:14 | INFO | inference | Building dataset done in 4.43ms 2023-12-21 22:38:14 | INFO | inference | Building model... 2023-12-21 22:38:14 | INFO | inference | Building model done in 275.277ms 2023-12-21 22:38:14 | INFO | inference | Initializing accelerate... 2023-12-21 22:38:15 | INFO | inference | Initializing accelerate done in 1093.010ms 2023-12-21 22:38:15 | INFO | inference | Loading checkpoint... 2023-12-21 22:38:15 | INFO | accelerate.accelerator | Loading states from ckpts/svc/vocalist_l1_contentvec+whisper/checkpoint/epoch-6852_step-0678447_loss-1.946773 2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All model weights loaded successfully 2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All optimizer states loaded successfully 2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All scheduler states loaded successfully 2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All dataloader sampler states loaded successfully 2023-12-21 22:38:15 | INFO | accelerate.checkpointing | All random states loaded successfully 2023-12-21 22:38:15 | INFO | accelerate.accelerator | Loading in 0 custom states 2023-12-21 22:38:15 | INFO | inference | Loading checkpoint done in 99.938ms 2023-12-21 22:38:15 | INFO | inference | Using PNDM scheduler. Model Init: 1.5s Auto transposing: source f0 median = 372.9, target f0 median = 318.3, factor = 0.85 0%| | 0/1009 [00:00<?, ?it/s] 0%| | 1/1009 [00:00<10:21, 1.62it/s] 2%|▏ | 20/1009 [00:00<00:26, 36.95it/s] 4%|▍ | 39/1009 [00:00<00:14, 69.13it/s] 6%|▌ | 58/1009 [00:00<00:09, 96.78it/s] 8%|▊ | 77/1009 [00:01<00:07, 118.74it/s] 10%|▉ | 96/1009 [00:01<00:06, 136.25it/s] 11%|█▏ | 115/1009 [00:01<00:05, 150.28it/s] 13%|█▎ | 135/1009 [00:01<00:05, 161.83it/s] 15%|█▌ | 155/1009 [00:01<00:05, 170.42it/s] 17%|█▋ | 175/1009 [00:01<00:04, 176.38it/s] 19%|█▉ | 195/1009 [00:01<00:04, 180.68it/s] 21%|██ | 214/1009 [00:01<00:04, 183.11it/s] 23%|██▎ | 233/1009 [00:01<00:04, 184.90it/s] 25%|██▌ | 253/1009 [00:01<00:04, 186.84it/s] 27%|██▋ | 272/1009 [00:02<00:03, 187.40it/s] 29%|██▉ | 292/1009 [00:02<00:03, 188.43it/s] 31%|███ | 311/1009 [00:02<00:03, 188.83it/s] 33%|███▎ | 331/1009 [00:02<00:03, 189.21it/s] 35%|███▍ | 351/1009 [00:02<00:03, 189.52it/s] 37%|███▋ | 371/1009 [00:02<00:03, 189.85it/s] 39%|███▉ | 391/1009 [00:02<00:03, 190.24it/s] 41%|████ | 411/1009 [00:02<00:03, 190.48it/s] 43%|████▎ | 431/1009 [00:02<00:03, 190.21it/s] 45%|████▍ | 451/1009 [00:02<00:02, 190.62it/s] 47%|████▋ | 471/1009 [00:03<00:02, 190.39it/s] 49%|████▊ | 491/1009 [00:03<00:02, 190.36it/s] 51%|█████ | 511/1009 [00:03<00:02, 189.68it/s] 53%|█████▎ | 530/1009 [00:03<00:02, 186.46it/s] 54%|█████▍ | 549/1009 [00:03<00:02, 184.33it/s] 56%|█████▋ | 568/1009 [00:03<00:02, 183.14it/s] 58%|█████▊ | 587/1009 [00:03<00:02, 181.72it/s] 60%|██████ | 606/1009 [00:03<00:02, 181.17it/s] 62%|██████▏ | 625/1009 [00:03<00:02, 181.36it/s] 64%|██████▍ | 644/1009 [00:04<00:02, 181.32it/s] 66%|██████▌ | 663/1009 [00:04<00:01, 181.26it/s] 68%|██████▊ | 682/1009 [00:04<00:01, 181.11it/s] 69%|██████▉ | 701/1009 [00:04<00:01, 181.30it/s] 71%|███████▏ | 720/1009 [00:04<00:01, 181.46it/s] 73%|███████▎ | 739/1009 [00:04<00:01, 181.68it/s] 75%|███████▌ | 759/1009 [00:04<00:01, 184.70it/s] 77%|███████▋ | 779/1009 [00:04<00:01, 186.43it/s] 79%|███████▉ | 799/1009 [00:04<00:01, 187.97it/s] 81%|████████ | 818/1009 [00:04<00:01, 185.71it/s] 83%|████████▎ | 837/1009 [00:05<00:00, 184.44it/s] 85%|████████▍ | 856/1009 [00:05<00:00, 183.59it/s] 87%|████████▋ | 875/1009 [00:05<00:00, 182.91it/s] 89%|████████▊ | 894/1009 [00:05<00:00, 181.95it/s] 91%|█████████ | 914/1009 [00:05<00:00, 184.87it/s] 93%|█████████▎| 934/1009 [00:05<00:00, 186.66it/s] 94%|█████████▍| 953/1009 [00:05<00:00, 186.56it/s] 96%|█████████▋| 972/1009 [00:05<00:00, 183.50it/s] 98%|█████████▊| 991/1009 [00:05<00:00, 182.73it/s] 100%|██████████| 1009/1009 [00:06<00:00, 167.18it/s] Synthesis audios using bigvgan vocoder... Loading Vocoder from Weights file: /src/Amphion/pretrained/bigvgan/400000.pt For predicted mels, #sample = 1... Model inference: 12.8s 100%|██████████| 1/1 [00:14<00:00, 14.84s/it] 100%|██████████| 1/1 [00:14<00:00, 14.84s/it] /src/Amphion/result/source/source_vocalist_l1_Beyonce.wav
Prediction
lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2bIDsm5odfdb53siq7x4sduavb2qfuStatusSucceededSourceWebHardwareA40 (Large)Total durationCreatedInput
- source_audio
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
- target_singer
- Bruno Mars
- key_shift_mode
- 0
- pitch_shift_control
- Auto Shift
- diffusion_inference_steps
- 1000
{ "source_audio": "https://replicate.delivery/pbxt/K5cs0E3Ap10wLf1vuZKTwxx7kbw7Z2OJsmUJQRRsZb01Nzys/adele.wav", "target_singer": "Bruno Mars", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b", { input: { source_audio: "https://replicate.delivery/pbxt/K5cs0E3Ap10wLf1vuZKTwxx7kbw7Z2OJsmUJQRRsZb01Nzys/adele.wav", target_singer: "Bruno Mars", key_shift_mode: 0, pitch_shift_control: "Auto Shift", diffusion_inference_steps: 1000 } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b", input={ "source_audio": "https://replicate.delivery/pbxt/K5cs0E3Ap10wLf1vuZKTwxx7kbw7Z2OJsmUJQRRsZb01Nzys/adele.wav", "target_singer": "Bruno Mars", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run lucataco/singing_voice_conversion using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "lucataco/singing_voice_conversion:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b", "input": { "source_audio": "https://replicate.delivery/pbxt/K5cs0E3Ap10wLf1vuZKTwxx7kbw7Z2OJsmUJQRRsZb01Nzys/adele.wav", "target_singer": "Bruno Mars", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
You can run this model locally using Cog. First, install Cog:brew install cog
If you don’t have Homebrew, there are other installation options available.
Run this to download the model and run it in your local environment:
cog predict r8.im/lucataco/singing_voice_conversion@sha256:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b \ -i 'source_audio="https://replicate.delivery/pbxt/K5cs0E3Ap10wLf1vuZKTwxx7kbw7Z2OJsmUJQRRsZb01Nzys/adele.wav"' \ -i 'target_singer="Bruno Mars"' \ -i 'key_shift_mode=0' \ -i 'pitch_shift_control="Auto Shift"' \ -i 'diffusion_inference_steps=1000'
To learn more, take a look at the Cog documentation.
Run this to download the model and run it in your local environment:
docker run -d -p 5000:5000 --gpus=all r8.im/lucataco/singing_voice_conversion@sha256:f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b
curl -s -X POST \ -H "Content-Type: application/json" \ -d $'{ "input": { "source_audio": "https://replicate.delivery/pbxt/K5cs0E3Ap10wLf1vuZKTwxx7kbw7Z2OJsmUJQRRsZb01Nzys/adele.wav", "target_singer": "Bruno Mars", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 } }' \ http://localhost:5000/predictions
To learn more, take a look at the Cog documentation.
Output
Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
The media could not be loaded, either because the server or network failed or because the format is not supported.Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "completed_at": "2023-12-21T22:39:27.615465Z", "created_at": "2023-12-21T22:39:04.141957Z", "data_removed": false, "error": null, "id": "sm5odfdb53siq7x4sduavb2qfu", "input": { "source_audio": "https://replicate.delivery/pbxt/K5cs0E3Ap10wLf1vuZKTwxx7kbw7Z2OJsmUJQRRsZb01Nzys/adele.wav", "target_singer": "Bruno Mars", "key_shift_mode": 0, "pitch_shift_control": "Auto Shift", "diffusion_inference_steps": 1000 }, "logs": "/tmp/input_audio\nvocalist_l1_BrunoMars\nautoshift\ngetopt: unrecognized option '--diffusion_inference_steps'\nExprimental Configuration File: ckpts/svc/vocalist_l1_contentvec+whisper/args.json\nThe following values were not passed to `accelerate launch` and had defaults used instead:\n`--num_processes` was set to a value of `1`\n`--num_machines` was set to a value of `1`\n`--mixed_precision` was set to a value of `'no'`\n`--dynamo_backend` was set to a value of `'no'`\nTo avoid this warning pass in values for each of the problematic parameters or run `accelerate config`.\nMonotonic align not found. Please make sure you have compiled it.\nThere are 1 source audios:\n**********\nConversion for source...\nPrepare for meta eval data: 0.0s\n 0%| | 0/1 [00:00<?, ?it/s]\n 0%| | 0/1 [00:00<?, ?it/s]\u001b[A\n100%|██████████| 1/1 [00:00<00:00, 1.93it/s]\u001b[A\n100%|██████████| 1/1 [00:00<00:00, 1.93it/s]\nPrepare for acoustic features: 0.5s\nPrepare for content features: 0.0s\n2023-12-21 22:39:11 | INFO | inference | ========================================================\n2023-12-21 22:39:11 | INFO | inference | ||\t\tNew inference process started.\t\t||\n2023-12-21 22:39:11 | INFO | inference | ========================================================\n2023-12-21 22:39:11 | INFO | inference |\n2023-12-21 22:39:11 | DEBUG | inference | Using DEBUG logging level.\n2023-12-21 22:39:11 | DEBUG | inference | Acoustic dir: ckpts/svc/vocalist_l1_contentvec+whisper\n2023-12-21 22:39:11 | DEBUG | inference | Vocoder dir: pretrained/bigvgan\n2023-12-21 22:39:11 | DEBUG | inference | Setting random seed done in 0.77ms\n2023-12-21 22:39:11 | DEBUG | inference | Random seed: 10086\n2023-12-21 22:39:11 | INFO | inference | Building dataset...\n2023-12-21 22:39:11 | INFO | inference | Building dataset done in 4.40ms\n2023-12-21 22:39:11 | INFO | inference | Building model...\n2023-12-21 22:39:11 | INFO | inference | Building model done in 277.159ms\n2023-12-21 22:39:11 | INFO | inference | Initializing accelerate...\n2023-12-21 22:39:12 | INFO | inference | Initializing accelerate done in 1047.520ms\n2023-12-21 22:39:12 | INFO | inference | Loading checkpoint...\n2023-12-21 22:39:12 | INFO | accelerate.accelerator | Loading states from ckpts/svc/vocalist_l1_contentvec+whisper/checkpoint/epoch-6852_step-0678447_loss-1.946773\n2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All model weights loaded successfully\n2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All optimizer states loaded successfully\n2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All scheduler states loaded successfully\n2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All dataloader sampler states loaded successfully\n2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All random states loaded successfully\n2023-12-21 22:39:12 | INFO | accelerate.accelerator | Loading in 0 custom states\n2023-12-21 22:39:12 | INFO | inference | Loading checkpoint done in 121.116ms\n2023-12-21 22:39:12 | INFO | inference | Using PNDM scheduler.\nModel Init: 1.5s\nAuto transposing: source f0 median = 372.9, target f0 median = 324.7, factor = 0.87\n 0%| | 0/1009 [00:00<?, ?it/s]\u001b[A\n 0%| | 1/1009 [00:00<09:00, 1.87it/s]\u001b[A\n 2%|▏ | 20/1009 [00:00<00:23, 41.39it/s]\u001b[A\n 4%|▍ | 39/1009 [00:00<00:12, 74.81it/s]\u001b[A\n 6%|▌ | 58/1009 [00:00<00:09, 101.86it/s]\u001b[A\n 8%|▊ | 76/1009 [00:00<00:07, 121.91it/s]\u001b[A\n 9%|▉ | 95/1009 [00:01<00:06, 138.32it/s]\u001b[A\n 11%|█ | 113/1009 [00:01<00:05, 149.71it/s]\u001b[A\n 13%|█▎ | 132/1009 [00:01<00:05, 158.90it/s]\u001b[A\n 15%|█▍ | 151/1009 [00:01<00:05, 165.63it/s]\u001b[A\n 17%|█▋ | 169/1009 [00:01<00:04, 169.63it/s]\u001b[A\n 19%|█▊ | 188/1009 [00:01<00:04, 172.73it/s]\u001b[A\n 20%|██ | 206/1009 [00:01<00:04, 174.22it/s]\u001b[A\n 22%|██▏ | 224/1009 [00:01<00:04, 175.66it/s]\u001b[A\n 24%|██▍ | 242/1009 [00:01<00:04, 176.23it/s]\u001b[A\n 26%|██▌ | 261/1009 [00:01<00:04, 178.83it/s]\u001b[A\n 28%|██▊ | 281/1009 [00:02<00:03, 182.73it/s]\u001b[A\n 30%|██▉ | 301/1009 [00:02<00:03, 185.36it/s]\u001b[A\n 32%|███▏ | 321/1009 [00:02<00:03, 187.40it/s]\u001b[A\n 34%|███▍ | 341/1009 [00:02<00:03, 188.55it/s]\u001b[A\n 36%|███▌ | 360/1009 [00:02<00:03, 188.36it/s]\u001b[A\n 38%|███▊ | 379/1009 [00:02<00:03, 185.55it/s]\u001b[A\n 39%|███▉ | 398/1009 [00:02<00:03, 184.13it/s]\u001b[A\n 41%|████▏ | 417/1009 [00:02<00:03, 183.10it/s]\u001b[A\n 43%|████▎ | 436/1009 [00:02<00:03, 182.07it/s]\u001b[A\n 45%|████▌ | 455/1009 [00:03<00:03, 181.82it/s]\u001b[A\n 47%|████▋ | 474/1009 [00:03<00:02, 180.98it/s]\u001b[A\n 49%|████▉ | 493/1009 [00:03<00:02, 180.87it/s]\u001b[A\n 51%|█████ | 512/1009 [00:03<00:02, 180.94it/s]\u001b[A\n 53%|█████▎ | 531/1009 [00:03<00:02, 180.34it/s]\u001b[A\n 55%|█████▍ | 550/1009 [00:03<00:02, 180.29it/s]\u001b[A\n 56%|█████▋ | 569/1009 [00:03<00:02, 179.63it/s]\u001b[A\n 58%|█████▊ | 587/1009 [00:03<00:02, 179.69it/s]\u001b[A\n 60%|██████ | 606/1009 [00:03<00:02, 179.75it/s]\u001b[A\n 62%|██████▏ | 625/1009 [00:03<00:02, 179.88it/s]\u001b[A\n 64%|██████▍ | 644/1009 [00:04<00:02, 180.05it/s]\u001b[A\n 66%|██████▌ | 663/1009 [00:04<00:01, 180.42it/s]\u001b[A\n 68%|██████▊ | 682/1009 [00:04<00:01, 180.35it/s]\u001b[A\n 69%|██████▉ | 701/1009 [00:04<00:01, 180.59it/s]\u001b[A\n 71%|███████▏ | 720/1009 [00:04<00:01, 180.16it/s]\u001b[A\n 73%|███████▎ | 739/1009 [00:04<00:01, 180.36it/s]\u001b[A\n 75%|███████▌ | 758/1009 [00:04<00:01, 180.36it/s]\u001b[A\n 77%|███████▋ | 777/1009 [00:04<00:01, 180.32it/s]\u001b[A\n 79%|███████▉ | 796/1009 [00:04<00:01, 180.10it/s]\u001b[A\n 81%|████████ | 815/1009 [00:05<00:01, 179.96it/s]\u001b[A\n 83%|████████▎ | 834/1009 [00:05<00:00, 180.12it/s]\u001b[A\n 85%|████████▍ | 853/1009 [00:05<00:00, 179.92it/s]\u001b[A\n 86%|████████▋ | 871/1009 [00:05<00:00, 179.49it/s]\u001b[A\n 88%|████████▊ | 891/1009 [00:05<00:00, 182.84it/s]\u001b[A\n 90%|█████████ | 910/1009 [00:05<00:00, 182.37it/s]\u001b[A\n 92%|█████████▏| 929/1009 [00:05<00:00, 182.12it/s]\u001b[A\n 94%|█████████▍| 948/1009 [00:05<00:00, 181.90it/s]\u001b[A\n 96%|█████████▌| 967/1009 [00:05<00:00, 181.53it/s]\u001b[A\n 98%|█████████▊| 986/1009 [00:05<00:00, 181.55it/s]\u001b[A\n100%|█████████▉| 1005/1009 [00:06<00:00, 181.89it/s]\u001b[A\n100%|██████████| 1009/1009 [00:06<00:00, 165.81it/s]\nSynthesis audios using bigvgan vocoder...\nLoading Vocoder from Weights file: /src/Amphion/pretrained/bigvgan/400000.pt\nFor predicted mels, #sample = 1...\nModel inference: 12.6s\n100%|██████████| 1/1 [00:14<00:00, 14.64s/it]\n100%|██████████| 1/1 [00:14<00:00, 14.64s/it]\n/src/Amphion/result/source/source_vocalist_l1_BrunoMars.wav", "metrics": { "predict_time": 23.436477, "total_time": 23.473508 }, "output": "https://replicate.delivery/pbxt/rajESoNgjeV1KKUeuqErPMm7zxOQWBMWLpLXN5tiK9e9EOJkA/source_vocalist_l1_BrunoMars.wav", "started_at": "2023-12-21T22:39:04.178988Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/sm5odfdb53siq7x4sduavb2qfu", "cancel": "https://api.replicate.com/v1/predictions/sm5odfdb53siq7x4sduavb2qfu/cancel" }, "version": "f29872ee3557e0186735048f1d6de98a52518ae5c49e19453b3fdaad710bdc2b" }
Generated in/tmp/input_audio vocalist_l1_BrunoMars autoshift getopt: unrecognized option '--diffusion_inference_steps' Exprimental Configuration File: ckpts/svc/vocalist_l1_contentvec+whisper/args.json The following values were not passed to `accelerate launch` and had defaults used instead: `--num_processes` was set to a value of `1` `--num_machines` was set to a value of `1` `--mixed_precision` was set to a value of `'no'` `--dynamo_backend` was set to a value of `'no'` To avoid this warning pass in values for each of the problematic parameters or run `accelerate config`. Monotonic align not found. Please make sure you have compiled it. There are 1 source audios: ********** Conversion for source... Prepare for meta eval data: 0.0s 0%| | 0/1 [00:00<?, ?it/s] 0%| | 0/1 [00:00<?, ?it/s] 100%|██████████| 1/1 [00:00<00:00, 1.93it/s] 100%|██████████| 1/1 [00:00<00:00, 1.93it/s] Prepare for acoustic features: 0.5s Prepare for content features: 0.0s 2023-12-21 22:39:11 | INFO | inference | ======================================================== 2023-12-21 22:39:11 | INFO | inference | || New inference process started. || 2023-12-21 22:39:11 | INFO | inference | ======================================================== 2023-12-21 22:39:11 | INFO | inference | 2023-12-21 22:39:11 | DEBUG | inference | Using DEBUG logging level. 2023-12-21 22:39:11 | DEBUG | inference | Acoustic dir: ckpts/svc/vocalist_l1_contentvec+whisper 2023-12-21 22:39:11 | DEBUG | inference | Vocoder dir: pretrained/bigvgan 2023-12-21 22:39:11 | DEBUG | inference | Setting random seed done in 0.77ms 2023-12-21 22:39:11 | DEBUG | inference | Random seed: 10086 2023-12-21 22:39:11 | INFO | inference | Building dataset... 2023-12-21 22:39:11 | INFO | inference | Building dataset done in 4.40ms 2023-12-21 22:39:11 | INFO | inference | Building model... 2023-12-21 22:39:11 | INFO | inference | Building model done in 277.159ms 2023-12-21 22:39:11 | INFO | inference | Initializing accelerate... 2023-12-21 22:39:12 | INFO | inference | Initializing accelerate done in 1047.520ms 2023-12-21 22:39:12 | INFO | inference | Loading checkpoint... 2023-12-21 22:39:12 | INFO | accelerate.accelerator | Loading states from ckpts/svc/vocalist_l1_contentvec+whisper/checkpoint/epoch-6852_step-0678447_loss-1.946773 2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All model weights loaded successfully 2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All optimizer states loaded successfully 2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All scheduler states loaded successfully 2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All dataloader sampler states loaded successfully 2023-12-21 22:39:12 | INFO | accelerate.checkpointing | All random states loaded successfully 2023-12-21 22:39:12 | INFO | accelerate.accelerator | Loading in 0 custom states 2023-12-21 22:39:12 | INFO | inference | Loading checkpoint done in 121.116ms 2023-12-21 22:39:12 | INFO | inference | Using PNDM scheduler. Model Init: 1.5s Auto transposing: source f0 median = 372.9, target f0 median = 324.7, factor = 0.87 0%| | 0/1009 [00:00<?, ?it/s] 0%| | 1/1009 [00:00<09:00, 1.87it/s] 2%|▏ | 20/1009 [00:00<00:23, 41.39it/s] 4%|▍ | 39/1009 [00:00<00:12, 74.81it/s] 6%|▌ | 58/1009 [00:00<00:09, 101.86it/s] 8%|▊ | 76/1009 [00:00<00:07, 121.91it/s] 9%|▉ | 95/1009 [00:01<00:06, 138.32it/s] 11%|█ | 113/1009 [00:01<00:05, 149.71it/s] 13%|█▎ | 132/1009 [00:01<00:05, 158.90it/s] 15%|█▍ | 151/1009 [00:01<00:05, 165.63it/s] 17%|█▋ | 169/1009 [00:01<00:04, 169.63it/s] 19%|█▊ | 188/1009 [00:01<00:04, 172.73it/s] 20%|██ | 206/1009 [00:01<00:04, 174.22it/s] 22%|██▏ | 224/1009 [00:01<00:04, 175.66it/s] 24%|██▍ | 242/1009 [00:01<00:04, 176.23it/s] 26%|██▌ | 261/1009 [00:01<00:04, 178.83it/s] 28%|██▊ | 281/1009 [00:02<00:03, 182.73it/s] 30%|██▉ | 301/1009 [00:02<00:03, 185.36it/s] 32%|███▏ | 321/1009 [00:02<00:03, 187.40it/s] 34%|███▍ | 341/1009 [00:02<00:03, 188.55it/s] 36%|███▌ | 360/1009 [00:02<00:03, 188.36it/s] 38%|███▊ | 379/1009 [00:02<00:03, 185.55it/s] 39%|███▉ | 398/1009 [00:02<00:03, 184.13it/s] 41%|████▏ | 417/1009 [00:02<00:03, 183.10it/s] 43%|████▎ | 436/1009 [00:02<00:03, 182.07it/s] 45%|████▌ | 455/1009 [00:03<00:03, 181.82it/s] 47%|████▋ | 474/1009 [00:03<00:02, 180.98it/s] 49%|████▉ | 493/1009 [00:03<00:02, 180.87it/s] 51%|█████ | 512/1009 [00:03<00:02, 180.94it/s] 53%|█████▎ | 531/1009 [00:03<00:02, 180.34it/s] 55%|█████▍ | 550/1009 [00:03<00:02, 180.29it/s] 56%|█████▋ | 569/1009 [00:03<00:02, 179.63it/s] 58%|█████▊ | 587/1009 [00:03<00:02, 179.69it/s] 60%|██████ | 606/1009 [00:03<00:02, 179.75it/s] 62%|██████▏ | 625/1009 [00:03<00:02, 179.88it/s] 64%|██████▍ | 644/1009 [00:04<00:02, 180.05it/s] 66%|██████▌ | 663/1009 [00:04<00:01, 180.42it/s] 68%|██████▊ | 682/1009 [00:04<00:01, 180.35it/s] 69%|██████▉ | 701/1009 [00:04<00:01, 180.59it/s] 71%|███████▏ | 720/1009 [00:04<00:01, 180.16it/s] 73%|███████▎ | 739/1009 [00:04<00:01, 180.36it/s] 75%|███████▌ | 758/1009 [00:04<00:01, 180.36it/s] 77%|███████▋ | 777/1009 [00:04<00:01, 180.32it/s] 79%|███████▉ | 796/1009 [00:04<00:01, 180.10it/s] 81%|████████ | 815/1009 [00:05<00:01, 179.96it/s] 83%|████████▎ | 834/1009 [00:05<00:00, 180.12it/s] 85%|████████▍ | 853/1009 [00:05<00:00, 179.92it/s] 86%|████████▋ | 871/1009 [00:05<00:00, 179.49it/s] 88%|████████▊ | 891/1009 [00:05<00:00, 182.84it/s] 90%|█████████ | 910/1009 [00:05<00:00, 182.37it/s] 92%|█████████▏| 929/1009 [00:05<00:00, 182.12it/s] 94%|█████████▍| 948/1009 [00:05<00:00, 181.90it/s] 96%|█████████▌| 967/1009 [00:05<00:00, 181.53it/s] 98%|█████████▊| 986/1009 [00:05<00:00, 181.55it/s] 100%|█████████▉| 1005/1009 [00:06<00:00, 181.89it/s] 100%|██████████| 1009/1009 [00:06<00:00, 165.81it/s] Synthesis audios using bigvgan vocoder... Loading Vocoder from Weights file: /src/Amphion/pretrained/bigvgan/400000.pt For predicted mels, #sample = 1... Model inference: 12.6s 100%|██████████| 1/1 [00:14<00:00, 14.64s/it] 100%|██████████| 1/1 [00:14<00:00, 14.64s/it] /src/Amphion/result/source/source_vocalist_l1_BrunoMars.wav
Want to make some of these yourself?
Run this model