lucataco / xtts-v2

Coqui XTTS-v2: Multilingual Text To Speech Voice Cloning

  • Public
  • 158.4K runs
  • GitHub
  • Paper
  • License



Run time and cost

This model runs on Nvidia A100 (40GB) GPU hardware. Predictions typically complete within 7 seconds.


This model expects that you use at least 6 seconds of audio

Note: Dont include spaces in your input audio file name


XTTS-v2 the Open, Foundation Speech Model by Coqui ๐Ÿธ

Language Settings: English: en ๐Ÿ‡บ๐Ÿ‡ธ French: fr ๐Ÿ‡ซ๐Ÿ‡ท German: de ๐Ÿ‡ฉ๐Ÿ‡ช Spanish: es ๐Ÿ‡ช๐Ÿ‡ธ Italian: it ๐Ÿ‡ฎ๐Ÿ‡น Portuguese: pt ๐Ÿ‡ต๐Ÿ‡น Czech: cs ๐Ÿ‡จ๐Ÿ‡ฟ Polish: pl ๐Ÿ‡ต๐Ÿ‡ฑ Russian: ru ๐Ÿ‡ท๐Ÿ‡บ Dutch: nl ๐Ÿ‡ณ๐Ÿ‡ฑ Turksih: tr ๐Ÿ‡น๐Ÿ‡ท Arabic: ar ๐Ÿ‡ฆ๐Ÿ‡ช Mandarin Chinese: zh-cn ๐Ÿ‡จ๐Ÿ‡ณ


11/28/23 - Added Hindi support