suminhthanh / vixtts

viⓍTTS vixTTS là mô hình tạo sinh giọng nói cho phép bạn sao chép giọng nói sang các ngôn ngữ khác nhau chỉ bằng cách sử dụng một đoạn âm thanh nhanh dài 6 giây

  • Public
  • 445 runs
  • T4
  • GitHub
  • License

Input

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
string
Shift + Return to add a new line

Text to synthesize

Default: "Xin chào các bạn"

*file

Original speaker audio (wav, mp3, m4a, ogg, or flv). Duration should be at least 6 seconds.

string

Output language for the synthesised speech

Default: "vi"

boolean

Whether to apply denoising to the speaker audio (microphone recordings)

Default: true

boolean

Whether to use deepfilter

Default: true

boolean

Whether to normalize the text

Default: true

string
Shift + Return to add a new line

AWS ACCESS KEY ID

string
Shift + Return to add a new line

AWS SECRET ACCESS KEY

string
Shift + Return to add a new line

AWS S3 Bucket Name

string
Shift + Return to add a new line

CDN Download URL

Output

Video Player is loading.
Current Time 00:00:000
Duration 00:00:000
Loaded: 0%
Stream Type LIVE
Remaining Time 00:00:000
 
1x
Generated in

This output was created using a different version of the model, suminhthanh/vixtts:29b957e2.

Run time and cost

This model runs on Nvidia T4 GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

viⓍTTS - [Replicate support]

viⓍTTS là mô hình tạo sinh giọng nói cho phép bạn sao chép giọng nói sang các ngôn ngữ khác nhau chỉ bằng cách sử dụng một đoạn âm thanh nhanh dài 6 giây. Mô hình này được tiếp tục đào tạo từ mô hình XTTS-v2.0.3 bằng cách mở rộng tokenizer sang tiếng Việt và huấn luyện trên tập dữ liệu viVoice.

viⓍTTS is a voice generation model that lets you clone voices into different languages by using just a quick 6-second audio clip. This model is fine-tuned from the XTTS-v2.0.3 model by expanding the tokenizer to Vietnamese and fine-tuning on the viVoice dataset.

Languages

viXTTS supports 18 languages: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Japanese (ja), Hungarian (hu), Korean (ko) Hindi (hi), Vietnamese (vi).

Known Limitations

  • Incompatibility with the original TTS library (a pull request will be made later).
  • Subpar performance for input sentences under 10 words in Vietnamese language (yielding inconsistent output and odd trailing sounds).
  • This model is only fine-tuned in Vietnamese. The model’s effectiveness with languages other than Vietnamese hasn’t been tested, potentially reducing quality.

Demo

Please checkout this repo

Usage

For a quick usage, please checkout this notebook

License

This model is licensed under Coqui Public Model License.

Contact

Fine-tuned by Thinh Le at FPT University HCMC, as a component of Non La’s graduation thesis. Contact: