lucataco / singing_voice_conversion

Amphion Singing Voice Conversion: DiffWaveNetSVC

  • Public
  • 708 runs
  • GitHub
  • Paper
  • License

Input

Output

Run time and cost

This model costs approximately $0.081 to run on Replicate, or 12 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia A40 (Large) GPU hardware. Predictions typically complete within 113 seconds. The predict time for this model varies significantly based on the inputs.

Readme

Implementation of the Hugginface Space: amphion/singing_voice_conversion

Amphion Singing Voice Conversion Pretrained Models

We provide a DiffWaveNetSVC pretrained checkpoint for you to play. Specially, it is trained under the real-world vocalist data (total duration: 6.16 hours), including the following 15 professional singers:

Singers:

  • Adele
  • John Mayer
  • Bruno Mars
  • Beyonce
  • Michael Jackson
  • Taylor Swift
  • David Tao 陶喆
  • Eason Chan 陈奕迅
  • Feng Wang 汪峰
  • Jian Li 李健
  • Ying Na 那英
  • Yijie Shi 石倚洁
  • Jacky Cheung 张学友
  • Faye Wong 王菲
  • Tsai Chin 蔡琴
@article{zhang2023leveraging,
  title={Leveraging Content-based Features from Multiple Acoustic Models for Singing Voice Conversion},
  author={Zhang, Xueyao and Gu, Yicheng and Chen, Haopeng and Fang, Zihao and Zou, Lexiao and Xue, Liumeng and Wu, Zhizheng},
  journal={Machine Learning for Audio Worshop, NeurIPS 2023},
  year={2023}
}