jichengdu
/
fish-speech
Fish Speech V1.5-SOTA Open Source TTS
Prediction
jichengdu/fish-speech:37f0e0183ffc300f3720c13981b66fd76e0adfbea0d7933e4d18a885c0772ba9Input
- text
- 我的猫猫就是全世界最好的猫
- use_compile
{ "text": "我的猫猫就是全世界最好的猫", "use_compile": true }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run jichengdu/fish-speech using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "jichengdu/fish-speech:37f0e0183ffc300f3720c13981b66fd76e0adfbea0d7933e4d18a885c0772ba9", { input: { text: "我的猫猫就是全世界最好的猫", use_compile: true } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run jichengdu/fish-speech using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "jichengdu/fish-speech:37f0e0183ffc300f3720c13981b66fd76e0adfbea0d7933e4d18a885c0772ba9", input={ "text": "我的猫猫就是全世界最好的猫", "use_compile": True } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run jichengdu/fish-speech using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "jichengdu/fish-speech:37f0e0183ffc300f3720c13981b66fd76e0adfbea0d7933e4d18a885c0772ba9", "input": { "text": "我的猫猫就是全世界最好的猫", "use_compile": true } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "completed_at": "2025-01-15T15:41:12.446544Z", "created_at": "2025-01-15T15:34:56.312000Z", "data_removed": false, "error": null, "id": "8efbxxkt71rge0cmdad8ftfn1c", "input": { "text": "我的猫猫就是全世界最好的猫", "use_compile": true }, "logs": "2025-01-15 15:37:58.652 | INFO | fish_speech.models.text2semantic.inference:main:1054 - Loading model ...\n2025-01-15 15:38:10.653 | INFO | fish_speech.models.text2semantic.inference:load_model:681 - Restored model from checkpoint\n2025-01-15 15:38:10.653 | INFO | fish_speech.models.text2semantic.inference:load_model:687 - Using DualARTransformer\n2025-01-15 15:38:10.653 | INFO | fish_speech.models.text2semantic.inference:load_model:695 - Compiling function...\n2025-01-15 15:38:13.083 | INFO | fish_speech.models.text2semantic.inference:main:1068 - Time to load model: 14.43 seconds\n2025-01-15 15:38:13.101 | INFO | fish_speech.models.text2semantic.inference:generate_long:788 - Encoded text: 我的猫猫就是全世界最好的猫\n2025-01-15 15:38:13.102 | INFO | fish_speech.models.text2semantic.inference:generate_long:806 - Generating sentence 1/1 of sample 1/1\n 0%| | 0/8165 [00:00<?, ?it/s]/root/.pyenv/versions/3.10.15/lib/python3.10/contextlib.py:103: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature.\nself.gen = func(*args, **kwds)\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping\nwarnings.warn(\n0%| | 1/8165 [02:30<340:29:37, 150.14s/it]/root/.pyenv/versions/3.10.15/lib/python3.10/contextlib.py:103: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature.\nself.gen = func(*args, **kwds)\n0%| | 2/8165 [02:46<162:33:57, 71.69s/it] /root/.pyenv/versions/3.10.15/lib/python3.10/contextlib.py:103: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature.\nself.gen = func(*args, **kwds)\n 0%| | 3/8165 [02:47<88:26:28, 39.01s/it] \n 0%| | 4/8165 [02:47<53:38:07, 23.66s/it]\n 0%| | 5/8165 [02:47<34:23:02, 15.17s/it]\n 0%| | 6/8165 [02:47<22:47:35, 10.06s/it]\n 0%| | 7/8165 [02:47<15:26:36, 6.81s/it]\n 0%| | 8/8165 [02:47<10:36:40, 4.68s/it]\n 0%| | 10/8165 [02:47<5:39:39, 2.50s/it]\n 0%| | 11/8165 [02:47<4:17:13, 1.89s/it]\n 0%| | 12/8165 [02:48<3:13:04, 1.42s/it]\n 0%| | 13/8165 [02:48<2:24:34, 1.06s/it]\n 0%| | 14/8165 [02:48<1:49:03, 1.25it/s]\n 0%| | 15/8165 [02:48<1:22:29, 1.65it/s]\n 0%| | 16/8165 [02:48<1:03:17, 2.15it/s]\n 0%| | 17/8165 [02:48<49:26, 2.75it/s] \n 0%| | 18/8165 [02:48<39:26, 3.44it/s]\n 0%| | 19/8165 [02:48<32:21, 4.20it/s]\n 0%| | 20/8165 [02:49<27:58, 4.85it/s]\n 0%| | 21/8165 [02:49<24:51, 5.46it/s]\n 0%| | 22/8165 [02:49<23:00, 5.90it/s]\n 0%| | 23/8165 [02:49<21:36, 6.28it/s]\n 0%| | 24/8165 [02:49<20:15, 6.70it/s]\n 0%| | 25/8165 [02:49<19:15, 7.05it/s]\n 0%| | 26/8165 [02:49<18:38, 7.28it/s]\n 0%| | 27/8165 [02:50<18:57, 7.15it/s]\n 0%| | 28/8165 [02:50<18:25, 7.36it/s]\n 0%| | 29/8165 [02:50<18:04, 7.50it/s]\n 0%| | 30/8165 [02:50<17:32, 7.73it/s]\n 0%| | 31/8165 [02:50<16:20, 8.29it/s]\n 0%| | 33/8165 [02:50<14:56, 9.07it/s]\n 0%| | 34/8165 [02:50<15:22, 8.81it/s]\n 0%| | 35/8165 [02:50<15:32, 8.72it/s]\n 0%| | 36/8165 [02:51<15:29, 8.74it/s]\n 0%| | 37/8165 [02:51<16:19, 8.29it/s]\n 0%| | 38/8165 [02:51<16:08, 8.39it/s]\n 0%| | 39/8165 [02:51<15:38, 8.66it/s]\n 0%| | 40/8165 [02:51<15:33, 8.70it/s]\n 1%| | 41/8165 [02:51<15:00, 9.02it/s]\n 1%| | 42/8165 [02:51<14:39, 9.24it/s]\n 1%| | 43/8165 [02:51<15:26, 8.77it/s]\n 1%| | 44/8165 [02:51<16:03, 8.43it/s]\n 1%| | 45/8165 [02:52<16:32, 8.18it/s]\n 1%| | 46/8165 [02:52<16:17, 8.31it/s]\n 1%| | 47/8165 [02:52<15:49, 8.55it/s]\n 1%| | 48/8165 [02:52<15:08, 8.93it/s]\n 1%| | 49/8165 [02:52<14:48, 9.13it/s]\n 1%| | 50/8165 [02:52<14:48, 9.14it/s]\n 1%| | 51/8165 [02:52<14:33, 9.29it/s]\n 1%| | 52/8165 [02:52<14:27, 9.36it/s]\n 1%| | 53/8165 [02:52<14:20, 9.43it/s]\n 1%| | 54/8165 [02:53<15:11, 8.90it/s]\n 1%| | 55/8165 [02:53<15:13, 8.88it/s]\n 1%| | 56/8165 [02:53<16:17, 8.29it/s]\n 1%| | 57/8165 [02:53<15:36, 8.66it/s]\n 1%| | 58/8165 [02:53<15:08, 8.92it/s]\n 1%| | 59/8165 [02:53<14:56, 9.04it/s]\n 1%| | 60/8165 [02:53<14:37, 9.23it/s]\n 1%| | 61/8165 [02:53<14:23, 9.38it/s]\n 1%| | 62/8165 [02:53<14:18, 9.44it/s]\n 1%| | 63/8165 [02:54<14:08, 9.55it/s]\n 1%| | 64/8165 [02:54<15:13, 8.86it/s]\n 1%| | 65/8165 [02:54<15:42, 8.59it/s]\n 1%| | 66/8165 [02:54<16:03, 8.41it/s]\n 1%| | 68/8165 [02:54<14:44, 9.15it/s]\n 1%| | 70/8165 [02:54<14:03, 9.59it/s]\n 1%| | 72/8165 [02:55<13:48, 9.76it/s]\n 1%| | 73/8165 [02:55<14:27, 9.33it/s]\n 1%| | 74/8165 [02:55<14:51, 9.07it/s]\n 1%| | 75/8165 [02:55<14:56, 9.03it/s]\n 1%| | 76/8165 [02:55<14:34, 9.25it/s]\n 1%| | 78/8165 [02:55<14:04, 9.58it/s]\n1%| | 78/8165 [02:55<5:03:46, 2.25s/it]\n2025-01-15 15:41:09.607 | INFO | fish_speech.models.text2semantic.inference:generate_long:851 - Compilation time: 176.50 seconds\n2025-01-15 15:41:09.607 | INFO | fish_speech.models.text2semantic.inference:generate_long:860 - Generated 80 tokens in 176.50 seconds, 0.45 tokens/sec\n2025-01-15 15:41:09.607 | INFO | fish_speech.models.text2semantic.inference:generate_long:863 - Bandwidth achieved: 0.29 GB/s\n2025-01-15 15:41:09.607 | INFO | fish_speech.models.text2semantic.inference:generate_long:868 - GPU Memory used: 1.83 GB\n2025-01-15 15:41:09.608 | INFO | fish_speech.models.text2semantic.inference:main:1101 - Sampled text: 我的猫猫就是全世界最好的猫\n2025-01-15 15:41:09.609 | INFO | fish_speech.models.text2semantic.inference:main:1105 - Saved codes to codes_0.npy\n2025-01-15 15:41:09.609 | INFO | fish_speech.models.text2semantic.inference:main:1106 - Next sample\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:445: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.\n@autocast(enabled = False)\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:630: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.\n@autocast(enabled = False)\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:147: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.\n@autocast(enabled = False)\n/root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/vector_quantize_pytorch/lookup_free_quantization.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.\n@autocast(enabled = False)\n2025-01-15 15:41:11.879 | INFO | fish_speech.models.vqgan.inference:load_model:46 - Loaded model: <All keys matched successfully>\n2025-01-15 15:41:11.880 | INFO | fish_speech.models.vqgan.inference:main:99 - Processing precomputed indices from codes_0.npy\n2025-01-15 15:41:12.385 | INFO | fish_speech.models.vqgan.inference:main:113 - Generated audio of shape torch.Size([1, 1, 161792]), equivalent to 3.67 seconds from 79 features, features/second: 21.53\n2025-01-15 15:41:12.389 | INFO | fish_speech.models.vqgan.inference:main:120 - Saved audio to fake.wav", "metrics": { "predict_time": 193.792021083, "total_time": 376.134544 }, "output": "https://replicate.delivery/czjl/Viab45MqZ9LIPVTJaD1DbKv7XmGs5p1KoHiuyP1SFPDGJWBF/fake.wav", "started_at": "2025-01-15T15:37:58.654523Z", "status": "succeeded", "urls": { "stream": "https://stream.replicate.com/v1/files/fddq-dhkn2oxk2yvw5sizzydeg7vwvrqa67sdeu4blmpe5yzcdoesdhra", "get": "https://api.replicate.com/v1/predictions/8efbxxkt71rge0cmdad8ftfn1c", "cancel": "https://api.replicate.com/v1/predictions/8efbxxkt71rge0cmdad8ftfn1c/cancel" }, "version": "37f0e0183ffc300f3720c13981b66fd76e0adfbea0d7933e4d18a885c0772ba9" }
Generated in2025-01-15 15:37:58.652 | INFO | fish_speech.models.text2semantic.inference:main:1054 - Loading model ... 2025-01-15 15:38:10.653 | INFO | fish_speech.models.text2semantic.inference:load_model:681 - Restored model from checkpoint 2025-01-15 15:38:10.653 | INFO | fish_speech.models.text2semantic.inference:load_model:687 - Using DualARTransformer 2025-01-15 15:38:10.653 | INFO | fish_speech.models.text2semantic.inference:load_model:695 - Compiling function... 2025-01-15 15:38:13.083 | INFO | fish_speech.models.text2semantic.inference:main:1068 - Time to load model: 14.43 seconds 2025-01-15 15:38:13.101 | INFO | fish_speech.models.text2semantic.inference:generate_long:788 - Encoded text: 我的猫猫就是全世界最好的猫 2025-01-15 15:38:13.102 | INFO | fish_speech.models.text2semantic.inference:generate_long:806 - Generating sentence 1/1 of sample 1/1 0%| | 0/8165 [00:00<?, ?it/s]/root/.pyenv/versions/3.10.15/lib/python3.10/contextlib.py:103: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature. self.gen = func(*args, **kwds) /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/torch/_inductor/compile_fx.py:1607: UserWarning: Tesla T4 does not support bfloat16 compilation natively, skipping warnings.warn( 0%| | 1/8165 [02:30<340:29:37, 150.14s/it]/root/.pyenv/versions/3.10.15/lib/python3.10/contextlib.py:103: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature. self.gen = func(*args, **kwds) 0%| | 2/8165 [02:46<162:33:57, 71.69s/it] /root/.pyenv/versions/3.10.15/lib/python3.10/contextlib.py:103: FutureWarning: `torch.backends.cuda.sdp_kernel()` is deprecated. In the future, this context manager will be removed. Please see `torch.nn.attention.sdpa_kernel()` for the new context manager, with updated signature. self.gen = func(*args, **kwds) 0%| | 3/8165 [02:47<88:26:28, 39.01s/it] 0%| | 4/8165 [02:47<53:38:07, 23.66s/it] 0%| | 5/8165 [02:47<34:23:02, 15.17s/it] 0%| | 6/8165 [02:47<22:47:35, 10.06s/it] 0%| | 7/8165 [02:47<15:26:36, 6.81s/it] 0%| | 8/8165 [02:47<10:36:40, 4.68s/it] 0%| | 10/8165 [02:47<5:39:39, 2.50s/it] 0%| | 11/8165 [02:47<4:17:13, 1.89s/it] 0%| | 12/8165 [02:48<3:13:04, 1.42s/it] 0%| | 13/8165 [02:48<2:24:34, 1.06s/it] 0%| | 14/8165 [02:48<1:49:03, 1.25it/s] 0%| | 15/8165 [02:48<1:22:29, 1.65it/s] 0%| | 16/8165 [02:48<1:03:17, 2.15it/s] 0%| | 17/8165 [02:48<49:26, 2.75it/s] 0%| | 18/8165 [02:48<39:26, 3.44it/s] 0%| | 19/8165 [02:48<32:21, 4.20it/s] 0%| | 20/8165 [02:49<27:58, 4.85it/s] 0%| | 21/8165 [02:49<24:51, 5.46it/s] 0%| | 22/8165 [02:49<23:00, 5.90it/s] 0%| | 23/8165 [02:49<21:36, 6.28it/s] 0%| | 24/8165 [02:49<20:15, 6.70it/s] 0%| | 25/8165 [02:49<19:15, 7.05it/s] 0%| | 26/8165 [02:49<18:38, 7.28it/s] 0%| | 27/8165 [02:50<18:57, 7.15it/s] 0%| | 28/8165 [02:50<18:25, 7.36it/s] 0%| | 29/8165 [02:50<18:04, 7.50it/s] 0%| | 30/8165 [02:50<17:32, 7.73it/s] 0%| | 31/8165 [02:50<16:20, 8.29it/s] 0%| | 33/8165 [02:50<14:56, 9.07it/s] 0%| | 34/8165 [02:50<15:22, 8.81it/s] 0%| | 35/8165 [02:50<15:32, 8.72it/s] 0%| | 36/8165 [02:51<15:29, 8.74it/s] 0%| | 37/8165 [02:51<16:19, 8.29it/s] 0%| | 38/8165 [02:51<16:08, 8.39it/s] 0%| | 39/8165 [02:51<15:38, 8.66it/s] 0%| | 40/8165 [02:51<15:33, 8.70it/s] 1%| | 41/8165 [02:51<15:00, 9.02it/s] 1%| | 42/8165 [02:51<14:39, 9.24it/s] 1%| | 43/8165 [02:51<15:26, 8.77it/s] 1%| | 44/8165 [02:51<16:03, 8.43it/s] 1%| | 45/8165 [02:52<16:32, 8.18it/s] 1%| | 46/8165 [02:52<16:17, 8.31it/s] 1%| | 47/8165 [02:52<15:49, 8.55it/s] 1%| | 48/8165 [02:52<15:08, 8.93it/s] 1%| | 49/8165 [02:52<14:48, 9.13it/s] 1%| | 50/8165 [02:52<14:48, 9.14it/s] 1%| | 51/8165 [02:52<14:33, 9.29it/s] 1%| | 52/8165 [02:52<14:27, 9.36it/s] 1%| | 53/8165 [02:52<14:20, 9.43it/s] 1%| | 54/8165 [02:53<15:11, 8.90it/s] 1%| | 55/8165 [02:53<15:13, 8.88it/s] 1%| | 56/8165 [02:53<16:17, 8.29it/s] 1%| | 57/8165 [02:53<15:36, 8.66it/s] 1%| | 58/8165 [02:53<15:08, 8.92it/s] 1%| | 59/8165 [02:53<14:56, 9.04it/s] 1%| | 60/8165 [02:53<14:37, 9.23it/s] 1%| | 61/8165 [02:53<14:23, 9.38it/s] 1%| | 62/8165 [02:53<14:18, 9.44it/s] 1%| | 63/8165 [02:54<14:08, 9.55it/s] 1%| | 64/8165 [02:54<15:13, 8.86it/s] 1%| | 65/8165 [02:54<15:42, 8.59it/s] 1%| | 66/8165 [02:54<16:03, 8.41it/s] 1%| | 68/8165 [02:54<14:44, 9.15it/s] 1%| | 70/8165 [02:54<14:03, 9.59it/s] 1%| | 72/8165 [02:55<13:48, 9.76it/s] 1%| | 73/8165 [02:55<14:27, 9.33it/s] 1%| | 74/8165 [02:55<14:51, 9.07it/s] 1%| | 75/8165 [02:55<14:56, 9.03it/s] 1%| | 76/8165 [02:55<14:34, 9.25it/s] 1%| | 78/8165 [02:55<14:04, 9.58it/s] 1%| | 78/8165 [02:55<5:03:46, 2.25s/it] 2025-01-15 15:41:09.607 | INFO | fish_speech.models.text2semantic.inference:generate_long:851 - Compilation time: 176.50 seconds 2025-01-15 15:41:09.607 | INFO | fish_speech.models.text2semantic.inference:generate_long:860 - Generated 80 tokens in 176.50 seconds, 0.45 tokens/sec 2025-01-15 15:41:09.607 | INFO | fish_speech.models.text2semantic.inference:generate_long:863 - Bandwidth achieved: 0.29 GB/s 2025-01-15 15:41:09.607 | INFO | fish_speech.models.text2semantic.inference:generate_long:868 - GPU Memory used: 1.83 GB 2025-01-15 15:41:09.608 | INFO | fish_speech.models.text2semantic.inference:main:1101 - Sampled text: 我的猫猫就是全世界最好的猫 2025-01-15 15:41:09.609 | INFO | fish_speech.models.text2semantic.inference:main:1105 - Saved codes to codes_0.npy 2025-01-15 15:41:09.609 | INFO | fish_speech.models.text2semantic.inference:main:1106 - Next sample /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:445: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. @autocast(enabled = False) /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/vector_quantize_pytorch/vector_quantize_pytorch.py:630: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. @autocast(enabled = False) /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/vector_quantize_pytorch/finite_scalar_quantization.py:147: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. @autocast(enabled = False) /root/.pyenv/versions/3.10.15/lib/python3.10/site-packages/vector_quantize_pytorch/lookup_free_quantization.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead. @autocast(enabled = False) 2025-01-15 15:41:11.879 | INFO | fish_speech.models.vqgan.inference:load_model:46 - Loaded model: <All keys matched successfully> 2025-01-15 15:41:11.880 | INFO | fish_speech.models.vqgan.inference:main:99 - Processing precomputed indices from codes_0.npy 2025-01-15 15:41:12.385 | INFO | fish_speech.models.vqgan.inference:main:113 - Generated audio of shape torch.Size([1, 1, 161792]), equivalent to 3.67 seconds from 79 features, features/second: 21.53 2025-01-15 15:41:12.389 | INFO | fish_speech.models.vqgan.inference:main:120 - Saved audio to fake.wav
Prediction
jichengdu/fish-speech:11f3e0394c06dcc099c0cbaf75f4a6e7da84cb4aaa5d53bedfc3234b5c8aaefcIDnh5y2bvd2nrme0cnpy5s8yd5c4StatusSucceededSourceWebHardwareL40STotal durationCreatedInput
- text
- 我的猫,就是全世界最好的猫!
- text_reference
- 希望你以后能够做得比我还好哟!
- speaker_reference
- Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x
- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "text": "我的猫,就是全世界最好的猫!", "text_reference": "希望你以后能够做得比我还好哟!", "speaker_reference": "https://replicate.delivery/pbxt/MhG1jpArOiucMqSja15lT6c1oEddigVDkJdx7VYa7fTB6Du8/zero_shot_prompt.wav" }
Install Replicate’s Node.js client library:npm install replicate
Import and set up the client:import Replicate from "replicate"; const replicate = new Replicate({ auth: process.env.REPLICATE_API_TOKEN, });
Run jichengdu/fish-speech using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
const output = await replicate.run( "jichengdu/fish-speech:11f3e0394c06dcc099c0cbaf75f4a6e7da84cb4aaa5d53bedfc3234b5c8aaefc", { input: { text: "我的猫,就是全世界最好的猫!", text_reference: "希望你以后能够做得比我还好哟!", speaker_reference: "https://replicate.delivery/pbxt/MhG1jpArOiucMqSja15lT6c1oEddigVDkJdx7VYa7fTB6Du8/zero_shot_prompt.wav" } } ); // To access the file URL: console.log(output.url()); //=> "http://example.com" // To write the file to disk: fs.writeFile("my-image.png", output);
To learn more, take a look at the guide on getting started with Node.js.
Install Replicate’s Python client library:pip install replicate
Import the client:import replicate
Run jichengdu/fish-speech using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
output = replicate.run( "jichengdu/fish-speech:11f3e0394c06dcc099c0cbaf75f4a6e7da84cb4aaa5d53bedfc3234b5c8aaefc", input={ "text": "我的猫,就是全世界最好的猫!", "text_reference": "希望你以后能够做得比我还好哟!", "speaker_reference": "https://replicate.delivery/pbxt/MhG1jpArOiucMqSja15lT6c1oEddigVDkJdx7VYa7fTB6Du8/zero_shot_prompt.wav" } ) print(output)
To learn more, take a look at the guide on getting started with Python.
Run jichengdu/fish-speech using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.
curl -s -X POST \ -H "Authorization: Bearer $REPLICATE_API_TOKEN" \ -H "Content-Type: application/json" \ -H "Prefer: wait" \ -d $'{ "version": "jichengdu/fish-speech:11f3e0394c06dcc099c0cbaf75f4a6e7da84cb4aaa5d53bedfc3234b5c8aaefc", "input": { "text": "我的猫,就是全世界最好的猫!", "text_reference": "希望你以后能够做得比我还好哟!", "speaker_reference": "https://replicate.delivery/pbxt/MhG1jpArOiucMqSja15lT6c1oEddigVDkJdx7VYa7fTB6Du8/zero_shot_prompt.wav" } }' \ https://api.replicate.com/v1/predictions
To learn more, take a look at Replicate’s HTTP API reference docs.
Output
Video Player is loading.Current Time 00:00:000/Duration 00:00:000Loaded: 0%00:00:000Stream Type LIVERemaining Time -00:00:0001x- Chapters
- descriptions off, selected
- captions settings, opens captions settings dialog
- captions off, selected
This is a modal window.
Beginning of dialog window. Escape will cancel and close the window.
End of dialog window.
{ "completed_at": "2025-03-21T07:14:00.675884Z", "created_at": "2025-03-21T07:12:02.837000Z", "data_removed": false, "error": null, "id": "nh5y2bvd2nrme0cnpy5s8yd5c4", "input": { "text": "我的猫,就是全世界最好的猫!", "text_reference": "希望你以后能够做得比我还好哟!", "speaker_reference": "https://replicate.delivery/pbxt/MhG1jpArOiucMqSja15lT6c1oEddigVDkJdx7VYa7fTB6Du8/zero_shot_prompt.wav" }, "logs": "2025-03-21 07:13:58.443 | INFO | tools.llama.generate:generate_long:789 - Encoded text: 我的猫,就是全世界最好的猫!\n2025-03-21 07:13:58.443 | INFO | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1\n 0%| | 0/8070 [00:00<?, ?it/s]/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.\nwarnings.warn(\n 0%| | 4/8070 [00:00<03:49, 35.18it/s]\n 0%| | 8/8070 [00:00<03:47, 35.46it/s]\n 0%| | 12/8070 [00:00<03:46, 35.55it/s]\n 0%| | 16/8070 [00:00<03:46, 35.59it/s]\n 0%| | 20/8070 [00:00<03:46, 35.52it/s]\n 0%| | 24/8070 [00:00<03:47, 35.30it/s]\n 0%| | 28/8070 [00:00<03:48, 35.14it/s]\n 0%| | 32/8070 [00:00<03:48, 35.21it/s]\n 0%| | 36/8070 [00:01<03:47, 35.30it/s]\n 0%| | 40/8070 [00:01<03:46, 35.42it/s]\n 1%| | 44/8070 [00:01<03:45, 35.52it/s]\n 1%| | 48/8070 [00:01<03:45, 35.59it/s]\n 1%| | 52/8070 [00:01<03:44, 35.64it/s]\n 1%| | 56/8070 [00:01<03:44, 35.67it/s]\n1%| | 56/8070 [00:01<03:49, 34.85it/s]\n2025-03-21 07:14:00.300 | INFO | tools.llama.generate:generate_long:861 - Generated 58 tokens in 1.86 seconds, 31.24 tokens/sec\n2025-03-21 07:14:00.301 | INFO | tools.llama.generate:generate_long:864 - Bandwidth achieved: 19.93 GB/s\n2025-03-21 07:14:00.301 | INFO | tools.llama.generate:generate_long:869 - GPU Memory used: 2.03 GB\n/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/nn/modules/conv.py:306: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)\nreturn F.conv1d(input, weight, bias, self.stride,\nNext sample", "metrics": { "predict_time": 2.742632232, "total_time": 117.838884 }, "output": "https://replicate.delivery/xezq/A3oXUsmefIjU8E5b7pUhXzirnbSQBKN3fhj2YzYMSxcxdY1oA/generated.wav", "started_at": "2025-03-21T07:13:57.933251Z", "status": "succeeded", "urls": { "stream": "https://stream.replicate.com/v1/files/bcwr-3yw4fygktiy33njadjegpnrndayxos2yz7fm6snd5s46k7d5x4qa", "get": "https://api.replicate.com/v1/predictions/nh5y2bvd2nrme0cnpy5s8yd5c4", "cancel": "https://api.replicate.com/v1/predictions/nh5y2bvd2nrme0cnpy5s8yd5c4/cancel" }, "version": "11f3e0394c06dcc099c0cbaf75f4a6e7da84cb4aaa5d53bedfc3234b5c8aaefc" }
Generated in2025-03-21 07:13:58.443 | INFO | tools.llama.generate:generate_long:789 - Encoded text: 我的猫,就是全世界最好的猫! 2025-03-21 07:13:58.443 | INFO | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1 0%| | 0/8070 [00:00<?, ?it/s]/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature. warnings.warn( 0%| | 4/8070 [00:00<03:49, 35.18it/s] 0%| | 8/8070 [00:00<03:47, 35.46it/s] 0%| | 12/8070 [00:00<03:46, 35.55it/s] 0%| | 16/8070 [00:00<03:46, 35.59it/s] 0%| | 20/8070 [00:00<03:46, 35.52it/s] 0%| | 24/8070 [00:00<03:47, 35.30it/s] 0%| | 28/8070 [00:00<03:48, 35.14it/s] 0%| | 32/8070 [00:00<03:48, 35.21it/s] 0%| | 36/8070 [00:01<03:47, 35.30it/s] 0%| | 40/8070 [00:01<03:46, 35.42it/s] 1%| | 44/8070 [00:01<03:45, 35.52it/s] 1%| | 48/8070 [00:01<03:45, 35.59it/s] 1%| | 52/8070 [00:01<03:44, 35.64it/s] 1%| | 56/8070 [00:01<03:44, 35.67it/s] 1%| | 56/8070 [00:01<03:49, 34.85it/s] 2025-03-21 07:14:00.300 | INFO | tools.llama.generate:generate_long:861 - Generated 58 tokens in 1.86 seconds, 31.24 tokens/sec 2025-03-21 07:14:00.301 | INFO | tools.llama.generate:generate_long:864 - Bandwidth achieved: 19.93 GB/s 2025-03-21 07:14:00.301 | INFO | tools.llama.generate:generate_long:869 - GPU Memory used: 2.03 GB /root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/nn/modules/conv.py:306: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.) return F.conv1d(input, weight, bias, self.stride, Next sample
Want to make some of these yourself?
Run this model