lucataco / singing_voice_conversion

Amphion Singing Voice Conversion: DiffWaveNetSVC

  • Public
  • 822 runs
  • L40S
  • GitHub
  • Paper
  • License

Input

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
*file

Input source audio file

string

Target singer to convert audio to

Default: "Taylor Swift"

string

Pitch shift control

Default: "Auto Shift"

integer
(minimum: -6, maximum: 6)

Key shift values

Default: 0

integer
(minimum: 0, maximum: 1000)

Diffusion inference steps

Default: 1000

Output

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
Generated in

Run time and cost

This model costs approximately $0.11 to run on Replicate, or 9 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia L40S GPU hardware. Predictions typically complete within 113 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Implementation of the Hugginface Space: amphion/singing_voice_conversion

Amphion Singing Voice Conversion Pretrained Models

We provide a DiffWaveNetSVC pretrained checkpoint for you to play. Specially, it is trained under the real-world vocalist data (total duration: 6.16 hours), including the following 15 professional singers:

Singers:

  • Adele
  • John Mayer
  • Bruno Mars
  • Beyonce
  • Michael Jackson
  • Taylor Swift
  • David Tao 陶喆
  • Eason Chan 陈奕迅
  • Feng Wang 汪峰
  • Jian Li 李健
  • Ying Na 那英
  • Yijie Shi 石倚洁
  • Jacky Cheung 张学友
  • Faye Wong 王菲
  • Tsai Chin 蔡琴
@article{zhang2023leveraging,
  title={Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion},
  author={Zhang, Xueyao and Gu, Yicheng and Chen, Haopeng and Fang, Zihao and Zou, Lexiao and Xue, Liumeng and Wu, Zhizheng},
  journal={Machine Learning for Audio Worshop, NeurIPS 2023},
  year={2023}
}