ttsds/fishspeech_1_2_sft
The Fish Speech V1.2 SFT model.
The VALL-E models by Amphion.
The NaturalSpeech2 model by Amphion.
The MaskGCT model by Amphion.
The Vevo model by Amphion.
The Bark model by Suno.
The small version of the Bark model by Suno.
The Fish Speech V1.0 model.
The Fish Speech V1.1 model.
The Fish Speech V1.2 model.
The Fish Speech V1.4 model.
The Fish Speech V1.5 model.
Prediction
ttsds/fishspeech_1_2_sft:34a7e498d81e49e7200ee9aaa52f18b2529f30162ba6afd163b43037aa5a5d20ID50t54px1jxrme0cmnrxaz05sp4StatusSucceededSourceWebHardwareL40STotal durationCreatedInput
- text
- With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.
- text_reference
- and keeping eternity before the eyes, though much
- speaker_reference
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.", "text_reference": "and keeping eternity before the eyes, though much", "speaker_reference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav" }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; import fs from "node:fs"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run ttsds/fishspeech_1_2_sft using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "ttsds/fishspeech_1_2_sft:34a7e498d81e49e7200ee9aaa52f18b2529f30162ba6afd163b43037aa5a5d20", { input: { text: "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.", text_reference: "and keeping eternity before the eyes, though much", speaker_reference: "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav" } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run ttsds/fishspeech_1_2_sft using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "ttsds/fishspeech_1_2_sft:34a7e498d81e49e7200ee9aaa52f18b2529f30162ba6afd163b43037aa5a5d20", input={ "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.", "text_reference": "and keeping eternity before the eyes, though much", "speaker_reference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav" } ) # To access the file URL: print(output.url()) #=> "http://example.com" # To write the file to disk: with open("my-image.png", "wb") as file: file.write(output.read())
To learn more, take a look at the guide on getting started with Python.
Run ttsds/fishspeech_1_2_sft using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "ttsds/fishspeech_1_2_sft:34a7e498d81e49e7200ee9aaa52f18b2529f30162ba6afd163b43037aa5a5d20", "input": { "text": "With tenure, Suzie\'d have all the more leisure for yachting, but her publications are no good.", "text_reference": "and keeping eternity before the eyes, though much", "speaker_reference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav" } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "completed_at": "2025-01-28T18:45:43.867597Z", "created_at": "2025-01-28T18:44:25.623000Z", "data_removed": false, "error": null, "id": "50t54px1jxrme0cmnrxaz05sp4", "input": { "text": "With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.", "text_reference": "and keeping eternity before the eyes, though much", "speaker_reference": "https://replicate.delivery/pbxt/MNFXdPaUPOwYCZjZM4azsymbzE2TCV2WJXfGpeV2DrFWaSq8/example_en.wav" }, "logs": "2025-01-28 18:45:39.050 | INFO | tools.llama.generate:generate_long:432 - Encoded text: With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good.\n2025-01-28 18:45:39.050 | INFO | tools.llama.generate:generate_long:450 - Generating sentence 1/1 of sample 1/1\n 0%| | 0/3892 [00:00<?, ?it/s]/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.\nwarnings.warn(\n 0%| | 6/3892 [00:00<01:10, 54.92it/s]\n 0%| | 12/3892 [00:00<01:08, 56.30it/s]\n 0%| | 18/3892 [00:00<01:08, 56.74it/s]\n 1%| | 24/3892 [00:00<01:07, 56.99it/s]\n 1%| | 30/3892 [00:00<01:07, 57.13it/s]\n 1%| | 36/3892 [00:00<01:07, 57.23it/s]\n 1%| | 42/3892 [00:00<01:07, 57.29it/s]\n 1%| | 48/3892 [00:00<01:07, 57.26it/s]\n 1%|▏ | 54/3892 [00:00<01:07, 57.27it/s]\n 2%|▏ | 60/3892 [00:01<01:06, 57.21it/s]\n 2%|▏ | 66/3892 [00:01<01:06, 57.23it/s]\n 2%|▏ | 72/3892 [00:01<01:06, 57.22it/s]\n 2%|▏ | 78/3892 [00:01<01:06, 57.22it/s]\n 2%|▏ | 84/3892 [00:01<01:07, 56.64it/s]\n 2%|▏ | 90/3892 [00:01<01:07, 56.65it/s]\n 2%|▏ | 96/3892 [00:01<01:06, 56.84it/s]\n 3%|▎ | 102/3892 [00:01<01:07, 56.24it/s]\n 3%|▎ | 108/3892 [00:01<01:06, 56.51it/s]\n 3%|▎ | 114/3892 [00:02<01:06, 56.73it/s]\n 3%|▎ | 120/3892 [00:02<01:06, 56.94it/s]\n 3%|▎ | 126/3892 [00:02<01:06, 57.04it/s]\n 3%|▎ | 132/3892 [00:02<01:05, 57.14it/s]\n 4%|▎ | 138/3892 [00:02<01:06, 56.88it/s]\n 4%|▎ | 144/3892 [00:02<01:05, 56.98it/s]\n 4%|▍ | 150/3892 [00:02<01:05, 57.13it/s]\n 4%|▍ | 156/3892 [00:02<01:05, 57.09it/s]\n 4%|▍ | 162/3892 [00:02<01:05, 57.00it/s]\n 4%|▍ | 168/3892 [00:02<01:05, 57.04it/s]\n 4%|▍ | 174/3892 [00:03<01:05, 57.15it/s]\n 5%|▍ | 180/3892 [00:03<01:04, 57.20it/s]\n 5%|▍ | 186/3892 [00:03<01:04, 57.18it/s]\n 5%|▍ | 192/3892 [00:03<01:04, 57.21it/s]\n 5%|▌ | 198/3892 [00:03<01:04, 57.24it/s]\n 5%|▌ | 204/3892 [00:03<01:04, 57.32it/s]\n 5%|▌ | 210/3892 [00:03<01:04, 57.38it/s]\n 6%|▌ | 216/3892 [00:03<01:04, 57.34it/s]\n 6%|▌ | 222/3892 [00:03<01:04, 57.27it/s]\n 6%|▌ | 228/3892 [00:03<01:03, 57.36it/s]\n 6%|▌ | 234/3892 [00:04<01:04, 56.65it/s]\n 6%|▌ | 240/3892 [00:04<01:04, 56.89it/s]\n6%|▌ | 243/3892 [00:04<01:04, 56.78it/s]\n2025-01-28 18:45:43.499 | INFO | tools.llama.generate:generate_long:505 - Generated 245 tokens in 4.45 seconds, 55.07 tokens/sec\n2025-01-28 18:45:43.499 | INFO | tools.llama.generate:generate_long:508 - Bandwidth achieved: 27.00 GB/s\n2025-01-28 18:45:43.499 | INFO | tools.llama.generate:generate_long:513 - GPU Memory used: 1.56 GB\n/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/nn/modules/conv.py:306: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)\nreturn F.conv1d(input, weight, bias, self.stride,\nNext sample", "metrics": { "predict_time": 5.09241007, "total_time": 78.244597 }, "output": "https://replicate.delivery/xezq/rXAanJ47XKbWJ9gTbnNPibxbNBX3CmzjCKhUb1wyrL81XbCF/generated.wav", "started_at": "2025-01-28T18:45:38.775187Z", "status": "succeeded", "urls": { "stream": "https://stream.replicate.com/v1/files/bsvm-peyax7uzinhbwwofrvbyjmtxaocejnhcttq6jp3mrgy5jvwh5xbq", "get": "https://api.replicate.com/v1/predictions/50t54px1jxrme0cmnrxaz05sp4", "cancel": "https://api.replicate.com/v1/predictions/50t54px1jxrme0cmnrxaz05sp4/cancel" }, "version": "34a7e498d81e49e7200ee9aaa52f18b2529f30162ba6afd163b43037aa5a5d20" }
Generated in2025-01-28 18:45:39.050 | INFO | tools.llama.generate:generate_long:432 - Encoded text: With tenure, Suzie'd have all the more leisure for yachting, but her publications are no good. 2025-01-28 18:45:39.050 | INFO | tools.llama.generate:generate_long:450 - Generating sentence 1/1 of sample 1/1 0%| | 0/3892 [00:00<?, ?it/s]/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature. warnings.warn( 0%| | 6/3892 [00:00<01:10, 54.92it/s] 0%| | 12/3892 [00:00<01:08, 56.30it/s] 0%| | 18/3892 [00:00<01:08, 56.74it/s] 1%| | 24/3892 [00:00<01:07, 56.99it/s] 1%| | 30/3892 [00:00<01:07, 57.13it/s] 1%| | 36/3892 [00:00<01:07, 57.23it/s] 1%| | 42/3892 [00:00<01:07, 57.29it/s] 1%| | 48/3892 [00:00<01:07, 57.26it/s] 1%|▏ | 54/3892 [00:00<01:07, 57.27it/s] 2%|▏ | 60/3892 [00:01<01:06, 57.21it/s] 2%|▏ | 66/3892 [00:01<01:06, 57.23it/s] 2%|▏ | 72/3892 [00:01<01:06, 57.22it/s] 2%|▏ | 78/3892 [00:01<01:06, 57.22it/s] 2%|▏ | 84/3892 [00:01<01:07, 56.64it/s] 2%|▏ | 90/3892 [00:01<01:07, 56.65it/s] 2%|▏ | 96/3892 [00:01<01:06, 56.84it/s] 3%|▎ | 102/3892 [00:01<01:07, 56.24it/s] 3%|▎ | 108/3892 [00:01<01:06, 56.51it/s] 3%|▎ | 114/3892 [00:02<01:06, 56.73it/s] 3%|▎ | 120/3892 [00:02<01:06, 56.94it/s] 3%|▎ | 126/3892 [00:02<01:06, 57.04it/s] 3%|▎ | 132/3892 [00:02<01:05, 57.14it/s] 4%|▎ | 138/3892 [00:02<01:06, 56.88it/s] 4%|▎ | 144/3892 [00:02<01:05, 56.98it/s] 4%|▍ | 150/3892 [00:02<01:05, 57.13it/s] 4%|▍ | 156/3892 [00:02<01:05, 57.09it/s] 4%|▍ | 162/3892 [00:02<01:05, 57.00it/s] 4%|▍ | 168/3892 [00:02<01:05, 57.04it/s] 4%|▍ | 174/3892 [00:03<01:05, 57.15it/s] 5%|▍ | 180/3892 [00:03<01:04, 57.20it/s] 5%|▍ | 186/3892 [00:03<01:04, 57.18it/s] 5%|▍ | 192/3892 [00:03<01:04, 57.21it/s] 5%|▌ | 198/3892 [00:03<01:04, 57.24it/s] 5%|▌ | 204/3892 [00:03<01:04, 57.32it/s] 5%|▌ | 210/3892 [00:03<01:04, 57.38it/s] 6%|▌ | 216/3892 [00:03<01:04, 57.34it/s] 6%|▌ | 222/3892 [00:03<01:04, 57.27it/s] 6%|▌ | 228/3892 [00:03<01:03, 57.36it/s] 6%|▌ | 234/3892 [00:04<01:04, 56.65it/s] 6%|▌ | 240/3892 [00:04<01:04, 56.89it/s] 6%|▌ | 243/3892 [00:04<01:04, 56.78it/s] 2025-01-28 18:45:43.499 | INFO | tools.llama.generate:generate_long:505 - Generated 245 tokens in 4.45 seconds, 55.07 tokens/sec 2025-01-28 18:45:43.499 | INFO | tools.llama.generate:generate_long:508 - Bandwidth achieved: 27.00 GB/s 2025-01-28 18:45:43.499 | INFO | tools.llama.generate:generate_long:513 - GPU Memory used: 1.56 GB /root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/nn/modules/conv.py:306: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.) return F.conv1d(input, weight, bias, self.stride, Next sample
Want to make some of these yourself?
Run this model