lucataco / xtts-v2
Coqui XTTS-v2: Multilingual Text To Speech Voice Cloning (Updated 1 year, 6 months ago)
Prediction
lucataco/xtts-v2:e876df565d4d629da440ce5820d1d2c8c2adb963f52e526efc064911f841f85eID7qfm5cbbsdivydzlc35abwxjqqStatusSucceededSourceWebHardwareA100 (40GB)Total durationCreatedInput
- text
- Cuando tenía seis años, vi una vez una imagen magnífica
- language
- es
- speaker_wav
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "text": "Cuando tenía seis años, vi una vez una imagen magnífica", "language": "es", "speaker_wav": "https://replicate.delivery/pbxt/JqzvJMqmYeWjdUSULrjJbEYjsYUnd335Keufr2QyMCGKJtY4/male.wav" }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run lucataco/xtts-v2 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "lucataco/xtts-v2:e876df565d4d629da440ce5820d1d2c8c2adb963f52e526efc064911f841f85e", { input: { text: "Cuando tenía seis años, vi una vez una imagen magnífica", language: "es", speaker_wav: "https://replicate.delivery/pbxt/JqzvJMqmYeWjdUSULrjJbEYjsYUnd335Keufr2QyMCGKJtY4/male.wav" } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run lucataco/xtts-v2 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "lucataco/xtts-v2:e876df565d4d629da440ce5820d1d2c8c2adb963f52e526efc064911f841f85e", input={ "text": "Cuando tenía seis años, vi una vez una imagen magnífica", "language": "es", "speaker_wav": "https://replicate.delivery/pbxt/JqzvJMqmYeWjdUSULrjJbEYjsYUnd335Keufr2QyMCGKJtY4/male.wav" } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run lucataco/xtts-v2 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "lucataco/xtts-v2:e876df565d4d629da440ce5820d1d2c8c2adb963f52e526efc064911f841f85e", "input": { "text": "Cuando tenía seis años, vi una vez una imagen magnífica", "language": "es", "speaker_wav": "https://replicate.delivery/pbxt/JqzvJMqmYeWjdUSULrjJbEYjsYUnd335Keufr2QyMCGKJtY4/male.wav" } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%Stream Type LIVERemaining Time -00:00:0001x- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "completed_at": "2023-11-10T17:36:22.655818Z", "created_at": "2023-11-10T17:36:20.020038Z", "data_removed": false, "error": null, "id": "7qfm5cbbsdivydzlc35abwxjqq", "input": { "text": "Cuando tenía seis años, vi una vez una imagen magnífica", "language": "es", "speaker_wav": "https://replicate.delivery/pbxt/JqzvJMqmYeWjdUSULrjJbEYjsYUnd335Keufr2QyMCGKJtY4/male.wav" }, "logs": "> Text splitted to sentences.\n['Cuando tenía seis años, vi una vez una imagen magnífica']\n> Processing time: 2.210158109664917\n> Real-time factor: 0.5469337663641523", "metrics": { "predict_time": 2.664174, "total_time": 2.63578 }, "output": "https://replicate.delivery/pbxt/wwqNm5bNhGaoERcKWOe3tN7v2xrKhBxL2srKyzCCTLJL4g7IA/output.wav", "started_at": "2023-11-10T17:36:19.991644Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/7qfm5cbbsdivydzlc35abwxjqq", "cancel": "https://api.replicate.com/v1/predictions/7qfm5cbbsdivydzlc35abwxjqq/cancel" }, "version": "e876df565d4d629da440ce5820d1d2c8c2adb963f52e526efc064911f841f85e" }
Generated in> Text splitted to sentences. ['Cuando tenía seis años, vi una vez una imagen magnífica'] > Processing time: 2.210158109664917 > Real-time factor: 0.5469337663641523
Prediction
lucataco/xtts-v2:e876df565d4d629da440ce5820d1d2c8c2adb963f52e526efc064911f841f85eIDjbzmikzby74g65lnjmtqhljopmStatusSucceededSourceWebHardwareA100 (40GB)Total durationCreatedInput
- text
- Quando eu tinha seis anos eu vi, uma vez, uma imagem magnífica
- language
- pt
- speaker_wav
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "text": "Quando eu tinha seis anos eu vi, uma vez, uma imagem magnífica", "language": "pt", "speaker_wav": "https://replicate.delivery/pbxt/JqzxMWScZ4O44XwIwWveDoeAE2Ga7gYdnXKb8l18Fv7D3QEx/female.wav" }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run lucataco/xtts-v2 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "lucataco/xtts-v2:e876df565d4d629da440ce5820d1d2c8c2adb963f52e526efc064911f841f85e", { input: { text: "Quando eu tinha seis anos eu vi, uma vez, uma imagem magnífica", language: "pt", speaker_wav: "https://replicate.delivery/pbxt/JqzxMWScZ4O44XwIwWveDoeAE2Ga7gYdnXKb8l18Fv7D3QEx/female.wav" } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run lucataco/xtts-v2 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "lucataco/xtts-v2:e876df565d4d629da440ce5820d1d2c8c2adb963f52e526efc064911f841f85e", input={ "text": "Quando eu tinha seis anos eu vi, uma vez, uma imagem magnífica", "language": "pt", "speaker_wav": "https://replicate.delivery/pbxt/JqzxMWScZ4O44XwIwWveDoeAE2Ga7gYdnXKb8l18Fv7D3QEx/female.wav" } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run lucataco/xtts-v2 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "lucataco/xtts-v2:e876df565d4d629da440ce5820d1d2c8c2adb963f52e526efc064911f841f85e", "input": { "text": "Quando eu tinha seis anos eu vi, uma vez, uma imagem magnífica", "language": "pt", "speaker_wav": "https://replicate.delivery/pbxt/JqzxMWScZ4O44XwIwWveDoeAE2Ga7gYdnXKb8l18Fv7D3QEx/female.wav" } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%Stream Type LIVERemaining Time -00:00:0001x- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "completed_at": "2023-11-10T17:37:09.854915Z", "created_at": "2023-11-10T17:37:06.564173Z", "data_removed": false, "error": null, "id": "jbzmikzby74g65lnjmtqhljopm", "input": { "text": "Quando eu tinha seis anos eu vi, uma vez, uma imagem magnífica", "language": "pt", "speaker_wav": "https://replicate.delivery/pbxt/JqzxMWScZ4O44XwIwWveDoeAE2Ga7gYdnXKb8l18Fv7D3QEx/female.wav" }, "logs": "> Text splitted to sentences.\n['Quando eu tinha seis anos eu vi, uma vez, uma imagem magnífica']\n> Processing time: 2.843817949295044\n> Real-time factor: 0.5418410910233973", "metrics": { "predict_time": 3.319713, "total_time": 3.290742 }, "output": "https://replicate.delivery/pbxt/e5H9tBUVKMS2AiGmOTkYpyc0OefOrVxsYL7qMgJywurLiDujA/output.wav", "started_at": "2023-11-10T17:37:06.535202Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/jbzmikzby74g65lnjmtqhljopm", "cancel": "https://api.replicate.com/v1/predictions/jbzmikzby74g65lnjmtqhljopm/cancel" }, "version": "e876df565d4d629da440ce5820d1d2c8c2adb963f52e526efc064911f841f85e" }
Generated in> Text splitted to sentences. ['Quando eu tinha seis anos eu vi, uma vez, uma imagem magnífica'] > Processing time: 2.843817949295044 > Real-time factor: 0.5418410910233973
Prediction
lucataco/xtts-v2:49ff6cfa14bd4e7f80f62e2279f82f23dfc2e7970f825f8db5599f8a6213c009IDk63yqrjb6heu4otsny7ruseoymStatusSucceededSourceWebHardwareA100 (40GB)Total durationCreatedInput
- text
- Hi there, I'm your new voice clone. Try your best to upload quality audio
- speaker
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
- language
- en
- cleanup_voice
{ "text": "Hi there, I'm your new voice clone. Try your best to upload quality audio", "speaker": "https://replicate.delivery/pbxt/Js5g5HmkREMVkB8jgGCgwKYfphH8oQfOQP8vqMNCChGf9gvX/female.wav", "language": "en", "cleanup_voice": false }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run lucataco/xtts-v2 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "lucataco/xtts-v2:49ff6cfa14bd4e7f80f62e2279f82f23dfc2e7970f825f8db5599f8a6213c009", { input: { text: "Hi there, I'm your new voice clone. Try your best to upload quality audio", speaker: "https://replicate.delivery/pbxt/Js5g5HmkREMVkB8jgGCgwKYfphH8oQfOQP8vqMNCChGf9gvX/female.wav", language: "en", cleanup_voice: false } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run lucataco/xtts-v2 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "lucataco/xtts-v2:49ff6cfa14bd4e7f80f62e2279f82f23dfc2e7970f825f8db5599f8a6213c009", input={ "text": "Hi there, I'm your new voice clone. Try your best to upload quality audio", "speaker": "https://replicate.delivery/pbxt/Js5g5HmkREMVkB8jgGCgwKYfphH8oQfOQP8vqMNCChGf9gvX/female.wav", "language": "en", "cleanup_voice": False } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run lucataco/xtts-v2 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "lucataco/xtts-v2:49ff6cfa14bd4e7f80f62e2279f82f23dfc2e7970f825f8db5599f8a6213c009", "input": { "text": "Hi there, I\'m your new voice clone. Try your best to upload quality audio", "speaker": "https://replicate.delivery/pbxt/Js5g5HmkREMVkB8jgGCgwKYfphH8oQfOQP8vqMNCChGf9gvX/female.wav", "language": "en", "cleanup_voice": false } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%Stream Type LIVERemaining Time -00:00:0001x- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "completed_at": "2023-11-16T16:41:39.819791Z", "created_at": "2023-11-16T16:41:35.419051Z", "data_removed": false, "error": null, "id": "k63yqrjb6heu4otsny7ruseoym", "input": { "text": "Hi there, I'm your new voice clone. Try your best to upload quality audio", "speaker": "https://replicate.delivery/pbxt/Js5g5HmkREMVkB8jgGCgwKYfphH8oQfOQP8vqMNCChGf9gvX/female.wav", "language": "en", "cleanup_voice": false }, "logs": "ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers\nbuilt with gcc 11 (Ubuntu 11.2.0-19ubuntu1)\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\nlibavutil 56. 70.100 / 56. 70.100\nlibavcodec 58.134.100 / 58.134.100\nlibavformat 58. 76.100 / 58. 76.100\nlibavdevice 58. 13.100 / 58. 13.100\nlibavfilter 7.110.100 / 7.110.100\nlibswscale 5. 9.100 / 5. 9.100\nlibswresample 3. 9.100 / 3. 9.100\nlibpostproc 55. 9.100 / 55. 9.100\nGuessed Channel Layout for Input Stream #0.0 : mono\nInput #0, wav, from '/tmp/tmpnefby6ykfemale.wav':\nMetadata:\nencoder : Lavf58.29.100\nDuration: 00:00:11.36, bitrate: 705 kb/s\nStream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s\nStream mapping:\nStream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))\nPress [q] to stop, [?] for help\nOutput #0, wav, to '/tmp/speaker.wav':\nMetadata:\nISFT : Lavf58.76.100\nStream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s\nMetadata:\nencoder : Lavc58.134.100 pcm_s16le\nsize= 4kB time=00:00:00.00 bitrate=N/A speed=N/A\nsize= 979kB time=00:00:11.33 bitrate= 707.4kbits/s speed=2.18e+03x\nvideo:0kB audio:978kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.007785%\n> Text splitted to sentences.\n[\"Hi there, I'm your new voice clone.\", 'Try your best to upload quality audio']\n> Processing time: 3.787696361541748\n> Real-time factor: 0.6062623749418956", "metrics": { "predict_time": 4.383172, "total_time": 4.40074 }, "output": "https://replicate.delivery/pbxt/okJV7pM2ZA5rHxq1GPKqo7WSQp1e8XBPY7QUtPqNcF5hwf4RA/output.wav", "started_at": "2023-11-16T16:41:35.436619Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/k63yqrjb6heu4otsny7ruseoym", "cancel": "https://api.replicate.com/v1/predictions/k63yqrjb6heu4otsny7ruseoym/cancel" }, "version": "49ff6cfa14bd4e7f80f62e2279f82f23dfc2e7970f825f8db5599f8a6213c009" }
Generated inffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers built with gcc 11 (Ubuntu 11.2.0-19ubuntu1) configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100 Guessed Channel Layout for Input Stream #0.0 : mono Input #0, wav, from '/tmp/tmpnefby6ykfemale.wav': Metadata: encoder : Lavf58.29.100 Duration: 00:00:11.36, bitrate: 705 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, wav, to '/tmp/speaker.wav': Metadata: ISFT : Lavf58.76.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s Metadata: encoder : Lavc58.134.100 pcm_s16le size= 4kB time=00:00:00.00 bitrate=N/A speed=N/A size= 979kB time=00:00:11.33 bitrate= 707.4kbits/s speed=2.18e+03x video:0kB audio:978kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.007785% > Text splitted to sentences. ["Hi there, I'm your new voice clone.", 'Try your best to upload quality audio'] > Processing time: 3.787696361541748 > Real-time factor: 0.6062623749418956
Prediction
lucataco/xtts-v2:49ff6cfa14bd4e7f80f62e2279f82f23dfc2e7970f825f8db5599f8a6213c009IDgnlu2zbbq4hgdubx5kkzn6lbzyStatusSucceededSourceWebHardwareA100 (40GB)Total durationCreatedby @lucatacoInput
- text
- Hi there, this is a test with an mp3 file
- speaker
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
- language
- en
- cleanup_voice
{ "text": "Hi there, this is a test with an mp3 file", "speaker": "https://replicate.delivery/pbxt/JuUpjfrVonmSBwOFzMsLR6uKiSLXN4zI12HfAhlAvdj7oc7g/male_audio.mp3", "language": "en", "cleanup_voice": false }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run lucataco/xtts-v2 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "lucataco/xtts-v2:49ff6cfa14bd4e7f80f62e2279f82f23dfc2e7970f825f8db5599f8a6213c009", { input: { text: "Hi there, this is a test with an mp3 file", speaker: "https://replicate.delivery/pbxt/JuUpjfrVonmSBwOFzMsLR6uKiSLXN4zI12HfAhlAvdj7oc7g/male_audio.mp3", language: "en", cleanup_voice: false } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run lucataco/xtts-v2 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "lucataco/xtts-v2:49ff6cfa14bd4e7f80f62e2279f82f23dfc2e7970f825f8db5599f8a6213c009", input={ "text": "Hi there, this is a test with an mp3 file", "speaker": "https://replicate.delivery/pbxt/JuUpjfrVonmSBwOFzMsLR6uKiSLXN4zI12HfAhlAvdj7oc7g/male_audio.mp3", "language": "en", "cleanup_voice": False } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run lucataco/xtts-v2 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "lucataco/xtts-v2:49ff6cfa14bd4e7f80f62e2279f82f23dfc2e7970f825f8db5599f8a6213c009", "input": { "text": "Hi there, this is a test with an mp3 file", "speaker": "https://replicate.delivery/pbxt/JuUpjfrVonmSBwOFzMsLR6uKiSLXN4zI12HfAhlAvdj7oc7g/male_audio.mp3", "language": "en", "cleanup_voice": false } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%Stream Type LIVERemaining Time -00:00:0001x- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "completed_at": "2023-11-20T14:08:02.397201Z", "created_at": "2023-11-20T14:07:59.703056Z", "data_removed": false, "error": null, "id": "gnlu2zbbq4hgdubx5kkzn6lbzy", "input": { "text": "Hi there, this is a test with an mp3 file", "speaker": "https://replicate.delivery/pbxt/JuUpjfrVonmSBwOFzMsLR6uKiSLXN4zI12HfAhlAvdj7oc7g/male_audio.mp3", "language": "en", "cleanup_voice": false }, "logs": "ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers\nbuilt with gcc 11 (Ubuntu 11.2.0-19ubuntu1)\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\nlibavutil 56. 70.100 / 56. 70.100\nlibavcodec 58.134.100 / 58.134.100\nlibavformat 58. 76.100 / 58. 76.100\nlibavdevice 58. 13.100 / 58. 13.100\nlibavfilter 7.110.100 / 7.110.100\nlibswscale 5. 9.100 / 5. 9.100\nlibswresample 3. 9.100 / 3. 9.100\nlibpostproc 55. 9.100 / 55. 9.100\nInput #0, mp3, from '/tmp/tmpo6tiloubmale_audio.mp3':\nMetadata:\ntitle : EVCapture4.2.2软件录制\ncomment : audio-extractor.net\nDuration: 00:00:06.14, start: 0.025057, bitrate: 133 kb/s\nStream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s\nMetadata:\nencoder : Lavc59.37\nStream mapping:\nStream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native))\nPress [q] to stop, [?] for help\nOutput #0, wav, to '/tmp/speaker.wav':\nMetadata:\nINAM : EVCapture4.2.2软件录制\nICMT : audio-extractor.net\nISFT : Lavf58.76.100\nStream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s\nMetadata:\nencoder : Lavc58.134.100 pcm_s16le\nsize= 0kB time=00:00:00.00 bitrate=N/A speed= 0x\nsize= 1052kB time=00:00:06.08 bitrate=1415.8kbits/s speed= 520x\nvideo:0kB audio:1052kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.013182%\n> Text splitted to sentences.\n['Hi there, this is a test with an mp3 file']\n> Processing time: 2.1343066692352295\n> Real-time factor: 0.5281632929681812", "metrics": { "predict_time": 2.678693, "total_time": 2.694145 }, "output": "https://replicate.delivery/pbxt/RdFEQCNX6EbTL9Z6fPm4VK4qIQjX4RrTxhsfwfmnzsPESj0jA/output.wav", "started_at": "2023-11-20T14:07:59.718508Z", "status": "succeeded", "urls": { "get": "https://api.replicate.com/v1/predictions/gnlu2zbbq4hgdubx5kkzn6lbzy", "cancel": "https://api.replicate.com/v1/predictions/gnlu2zbbq4hgdubx5kkzn6lbzy/cancel" }, "version": "49ff6cfa14bd4e7f80f62e2279f82f23dfc2e7970f825f8db5599f8a6213c009" }
Generated inffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers built with gcc 11 (Ubuntu 11.2.0-19ubuntu1) configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100 Input #0, mp3, from '/tmp/tmpo6tiloubmale_audio.mp3': Metadata: title : EVCapture4.2.2软件录制 comment : audio-extractor.net Duration: 00:00:06.14, start: 0.025057, bitrate: 133 kb/s Stream #0:0: Audio: mp3, 44100 Hz, stereo, fltp, 128 kb/s Metadata: encoder : Lavc59.37 Stream mapping: Stream #0:0 -> #0:0 (mp3 (mp3float) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, wav, to '/tmp/speaker.wav': Metadata: INAM : EVCapture4.2.2软件录制 ICMT : audio-extractor.net ISFT : Lavf58.76.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, stereo, s16, 1411 kb/s Metadata: encoder : Lavc58.134.100 pcm_s16le size= 0kB time=00:00:00.00 bitrate=N/A speed= 0x size= 1052kB time=00:00:06.08 bitrate=1415.8kbits/s speed= 520x video:0kB audio:1052kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.013182% > Text splitted to sentences. ['Hi there, this is a test with an mp3 file'] > Processing time: 2.1343066692352295 > Real-time factor: 0.5281632929681812
Prediction
lucataco/xtts-v2:684bc3855b37866c0c65add2ff39c78f3dea3f4ff103a436465326e0f438d55eIDpsj32bc8chrj40ckkq7sc2nz54StatusSucceededSourceWebHardwareA100 (80GB)Total durationCreatedby @lucatacoInput
- text
- Hi there, I'm your new voice clone. Try your best to upload quality audio
- speaker
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
- language
- en
- cleanup_voice
{ "text": "Hi there, I'm your new voice clone. Try your best to upload quality audio", "speaker": "https://replicate.delivery/pbxt/Jt79w0xsT64R1JsiJ0LQRL8UcWspg5J4RFrU6YwEKpOT1ukS/male.wav", "language": "en", "cleanup_voice": false }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run lucataco/xtts-v2 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "lucataco/xtts-v2:684bc3855b37866c0c65add2ff39c78f3dea3f4ff103a436465326e0f438d55e", { input: { text: "Hi there, I'm your new voice clone. Try your best to upload quality audio", speaker: "https://replicate.delivery/pbxt/Jt79w0xsT64R1JsiJ0LQRL8UcWspg5J4RFrU6YwEKpOT1ukS/male.wav", language: "en", cleanup_voice: false } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run lucataco/xtts-v2 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "lucataco/xtts-v2:684bc3855b37866c0c65add2ff39c78f3dea3f4ff103a436465326e0f438d55e", input={ "text": "Hi there, I'm your new voice clone. Try your best to upload quality audio", "speaker": "https://replicate.delivery/pbxt/Jt79w0xsT64R1JsiJ0LQRL8UcWspg5J4RFrU6YwEKpOT1ukS/male.wav", "language": "en", "cleanup_voice": False } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run lucataco/xtts-v2 using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "lucataco/xtts-v2:684bc3855b37866c0c65add2ff39c78f3dea3f4ff103a436465326e0f438d55e", "input": { "text": "Hi there, I\'m your new voice clone. Try your best to upload quality audio", "speaker": "https://replicate.delivery/pbxt/Jt79w0xsT64R1JsiJ0LQRL8UcWspg5J4RFrU6YwEKpOT1ukS/male.wav", "language": "en", "cleanup_voice": false } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%Stream Type LIVERemaining Time -00:00:0001x- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "completed_at": "2024-12-06T21:10:46.572686Z", "created_at": "2024-12-06T21:10:44.068000Z", "data_removed": false, "error": null, "id": "psj32bc8chrj40ckkq7sc2nz54", "input": { "text": "Hi there, I'm your new voice clone. Try your best to upload quality audio", "speaker": "https://replicate.delivery/pbxt/Jt79w0xsT64R1JsiJ0LQRL8UcWspg5J4RFrU6YwEKpOT1ukS/male.wav", "language": "en", "cleanup_voice": false }, "logs": "ffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers\nbuilt with gcc 11 (Ubuntu 11.2.0-19ubuntu1)\nconfiguration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared\nlibavutil 56. 70.100 / 56. 70.100\nlibavcodec 58.134.100 / 58.134.100\nlibavformat 58. 76.100 / 58. 76.100\nlibavdevice 58. 13.100 / 58. 13.100\nlibavfilter 7.110.100 / 7.110.100\nlibswscale 5. 9.100 / 5. 9.100\nlibswresample 3. 9.100 / 3. 9.100\nlibpostproc 55. 9.100 / 55. 9.100\nGuessed Channel Layout for Input Stream #0.0 : mono\nInput #0, wav, from '/tmp/tmp4fsqhtw9male.wav':\nMetadata:\nencoder : Lavf58.29.100\nDuration: 00:00:08.64, bitrate: 705 kb/s\nStream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s\nStream mapping:\nStream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native))\nPress [q] to stop, [?] for help\nOutput #0, wav, to '/tmp/speaker.wav':\nMetadata:\nISFT : Lavf58.76.100\nStream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s\nMetadata:\nencoder : Lavc58.134.100 pcm_s16le\nsize= 4kB time=00:00:00.00 bitrate=N/A speed=N/A\nsize= 744kB time=00:00:08.63 bitrate= 705.9kbits/s speed=1.33e+03x\nvideo:0kB audio:744kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.010236%\n> Text splitted to sentences.\n[\"Hi there, I'm your new voice clone.\", 'Try your best to upload quality audio']\n> Processing time: 2.0034399032592773\n> Real-time factor: 0.34572885257690855", "metrics": { "predict_time": 2.49477223, "total_time": 2.504686 }, "output": "https://replicate.delivery/yhqm/OzeMUws45JW7ZC3PQIvp3Byi7GiBHOShONCSE4YnMoYr0I8JA/output.wav", "started_at": "2024-12-06T21:10:44.077914Z", "status": "succeeded", "urls": { "stream": "https://stream.replicate.com/v1/files/qoxq-uqorctmtu5tafceexottprehm5hkqbeapf4gztkwzv2cybm7bp5a", "get": "https://api.replicate.com/v1/predictions/psj32bc8chrj40ckkq7sc2nz54", "cancel": "https://api.replicate.com/v1/predictions/psj32bc8chrj40ckkq7sc2nz54/cancel" }, "version": "684bc3855b37866c0c65add2ff39c78f3dea3f4ff103a436465326e0f438d55e" }
Generated inffmpeg version 4.4.2-0ubuntu0.22.04.1 Copyright (c) 2000-2021 the FFmpeg developers built with gcc 11 (Ubuntu 11.2.0-19ubuntu1) configuration: --prefix=/usr --extra-version=0ubuntu0.22.04.1 --toolchain=hardened --libdir=/usr/lib/x86_64-linux-gnu --incdir=/usr/include/x86_64-linux-gnu --arch=amd64 --enable-gpl --disable-stripping --enable-gnutls --enable-ladspa --enable-libaom --enable-libass --enable-libbluray --enable-libbs2b --enable-libcaca --enable-libcdio --enable-libcodec2 --enable-libdav1d --enable-libflite --enable-libfontconfig --enable-libfreetype --enable-libfribidi --enable-libgme --enable-libgsm --enable-libjack --enable-libmp3lame --enable-libmysofa --enable-libopenjpeg --enable-libopenmpt --enable-libopus --enable-libpulse --enable-librabbitmq --enable-librubberband --enable-libshine --enable-libsnappy --enable-libsoxr --enable-libspeex --enable-libsrt --enable-libssh --enable-libtheora --enable-libtwolame --enable-libvidstab --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx265 --enable-libxml2 --enable-libxvid --enable-libzimg --enable-libzmq --enable-libzvbi --enable-lv2 --enable-omx --enable-openal --enable-opencl --enable-opengl --enable-sdl2 --enable-pocketsphinx --enable-librsvg --enable-libmfx --enable-libdc1394 --enable-libdrm --enable-libiec61883 --enable-chromaprint --enable-frei0r --enable-libx264 --enable-shared libavutil 56. 70.100 / 56. 70.100 libavcodec 58.134.100 / 58.134.100 libavformat 58. 76.100 / 58. 76.100 libavdevice 58. 13.100 / 58. 13.100 libavfilter 7.110.100 / 7.110.100 libswscale 5. 9.100 / 5. 9.100 libswresample 3. 9.100 / 3. 9.100 libpostproc 55. 9.100 / 55. 9.100 Guessed Channel Layout for Input Stream #0.0 : mono Input #0, wav, from '/tmp/tmp4fsqhtw9male.wav': Metadata: encoder : Lavf58.29.100 Duration: 00:00:08.64, bitrate: 705 kb/s Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s Stream mapping: Stream #0:0 -> #0:0 (pcm_s16le (native) -> pcm_s16le (native)) Press [q] to stop, [?] for help Output #0, wav, to '/tmp/speaker.wav': Metadata: ISFT : Lavf58.76.100 Stream #0:0: Audio: pcm_s16le ([1][0][0][0] / 0x0001), 44100 Hz, mono, s16, 705 kb/s Metadata: encoder : Lavc58.134.100 pcm_s16le size= 4kB time=00:00:00.00 bitrate=N/A speed=N/A size= 744kB time=00:00:08.63 bitrate= 705.9kbits/s speed=1.33e+03x video:0kB audio:744kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.010236% > Text splitted to sentences. ["Hi there, I'm your new voice clone.", 'Try your best to upload quality audio'] > Processing time: 2.0034399032592773 > Real-time factor: 0.34572885257690855
Want to make some of these yourself?
Run this model