lucataco / singing_voice_conversion

Amphion Singing Voice Conversion: DiffWaveNetSVC

  • Public
  • 621 runs
  • GitHub
  • Paper
  • License



Run time and cost

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 113 seconds. The predict time for this model varies significantly based on the inputs.


Implementation of the Hugginface Space: amphion/singing_voice_conversion

Amphion Singing Voice Conversion Pretrained Models

We provide a DiffWaveNetSVC pretrained checkpoint for you to play. Specially, it is trained under the real-world vocalist data (total duration: 6.16 hours), including the following 15 professional singers:


  • Adele
  • John Mayer
  • Bruno Mars
  • Beyonce
  • Michael Jackson
  • Taylor Swift
  • David Tao 陶喆
  • Eason Chan 陈奕迅
  • Feng Wang 汪峰
  • Jian Li 李健
  • Ying Na 那英
  • Yijie Shi 石倚洁
  • Jacky Cheung 张学友
  • Faye Wong 王菲
  • Tsai Chin 蔡琴
  title={Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion},
  author={Zhang, Xueyao and Gu, Yicheng and Chen, Haopeng and Fang, Zihao and Zou, Lexiao and Xue, Liumeng and Wu, Zhizheng},
  journal={Machine Learning for Audio Worshop, NeurIPS 2023},