jichengdu/fish-speech:11f3e039 – Run with an API on Replicate

Version

You're looking at a specific version of this model. Jump to the model overview.

jichengdu /fish-speech:11f3e039

Playground API Setup logs

Input

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

text

*string

Shift + Return to add a new line

我的猫，就是全世界最好的猫！我的猫，就是全世界最好的猫！

要转换成语音的文本 (Text to convert to speech)

speaker_reference

*file

参考音频文件 (Reference audio file)

text_reference

*string

Shift + Return to add a new line

希望你以后能够做得比我还好哟！希望你以后能够做得比我还好哟！

参考音频对应的文本内容 (Text content corresponding to the reference audio)

Run this model in Node.js with one line of code:

npx create-replicate --model=jichengdu/fish-speech

or set up a project from scratch

Install Replicate’s Node.js client library:

npm install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import and set up the client:

import Replicate from "replicate";

const replicate = new Replicate({
  auth: process.env.REPLICATE_API_TOKEN,
});

Run jichengdu/fish-speech using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

const output = await replicate.run(
  "jichengdu/fish-speech:11f3e0394c06dcc099c0cbaf75f4a6e7da84cb4aaa5d53bedfc3234b5c8aaefc",
  {
    input: {
      text: "我的猫，就是全世界最好的猫！",
      text_reference: "希望你以后能够做得比我还好哟！",
      speaker_reference: "https://replicate.delivery/pbxt/MhG1jpArOiucMqSja15lT6c1oEddigVDkJdx7VYa7fTB6Du8/zero_shot_prompt.wav"
    }
  }
);

// To access the file URL:
console.log(output.url()); //=> "http://example.com"

// To write the file to disk:
fs.writeFile("my-image.png", output);

To learn more, take a look at the guide on getting started with Node.js.

Install Replicate’s Python client library:

pip install replicate

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Import the client:

import replicate

Run jichengdu/fish-speech using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

output = replicate.run(
    "jichengdu/fish-speech:11f3e0394c06dcc099c0cbaf75f4a6e7da84cb4aaa5d53bedfc3234b5c8aaefc",
    input={
        "text": "我的猫，就是全世界最好的猫！",
        "text_reference": "希望你以后能够做得比我还好哟！",
        "speaker_reference": "https://replicate.delivery/pbxt/MhG1jpArOiucMqSja15lT6c1oEddigVDkJdx7VYa7fTB6Du8/zero_shot_prompt.wav"
    }
)
print(output)

To learn more, take a look at the guide on getting started with Python.

Set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

Find your API token in your account settings.

Run jichengdu/fish-speech using Replicate’s API. Check out the model's schema for an overview of inputs and outputs.

curl -s -X POST \
  -H "Authorization: Bearer $REPLICATE_API_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Prefer: wait" \
  -d $'{
    "version": "jichengdu/fish-speech:11f3e0394c06dcc099c0cbaf75f4a6e7da84cb4aaa5d53bedfc3234b5c8aaefc",
    "input": {
      "text": "我的猫，就是全世界最好的猫！",
      "text_reference": "希望你以后能够做得比我还好哟！",
      "speaker_reference": "https://replicate.delivery/pbxt/MhG1jpArOiucMqSja15lT6c1oEddigVDkJdx7VYa7fTB6Du8/zero_shot_prompt.wav"
    }
  }' \
  https://api.replicate.com/v1/predictions

To learn more, take a look at Replicate’s HTTP API reference docs.

Output

Video Player is loading.

Current Time 00:00:000

Duration 00:00:000

Loaded: 0%

Stream Type LIVE

Remaining Time 00:00:000

{
  "completed_at": "2025-03-21T07:14:00.675884Z",
  "created_at": "2025-03-21T07:12:02.837000Z",
  "data_removed": false,
  "error": null,
  "id": "nh5y2bvd2nrme0cnpy5s8yd5c4",
  "input": {
    "text": "我的猫，就是全世界最好的猫！",
    "text_reference": "希望你以后能够做得比我还好哟！",
    "speaker_reference": "https://replicate.delivery/pbxt/MhG1jpArOiucMqSja15lT6c1oEddigVDkJdx7VYa7fTB6Du8/zero_shot_prompt.wav"
  },
  "logs": "2025-03-21 07:13:58.443 | INFO     | tools.llama.generate:generate_long:789 - Encoded text: 我的猫，就是全世界最好的猫！\n2025-03-21 07:13:58.443 | INFO     | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1\n  0%|          | 0/8070 [00:00<?, ?it/s]/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.\nwarnings.warn(\n  0%|          | 4/8070 [00:00<03:49, 35.18it/s]\n  0%|          | 8/8070 [00:00<03:47, 35.46it/s]\n  0%|          | 12/8070 [00:00<03:46, 35.55it/s]\n  0%|          | 16/8070 [00:00<03:46, 35.59it/s]\n  0%|          | 20/8070 [00:00<03:46, 35.52it/s]\n  0%|          | 24/8070 [00:00<03:47, 35.30it/s]\n  0%|          | 28/8070 [00:00<03:48, 35.14it/s]\n  0%|          | 32/8070 [00:00<03:48, 35.21it/s]\n  0%|          | 36/8070 [00:01<03:47, 35.30it/s]\n  0%|          | 40/8070 [00:01<03:46, 35.42it/s]\n  1%|          | 44/8070 [00:01<03:45, 35.52it/s]\n  1%|          | 48/8070 [00:01<03:45, 35.59it/s]\n  1%|          | 52/8070 [00:01<03:44, 35.64it/s]\n  1%|          | 56/8070 [00:01<03:44, 35.67it/s]\n1%|          | 56/8070 [00:01<03:49, 34.85it/s]\n2025-03-21 07:14:00.300 | INFO     | tools.llama.generate:generate_long:861 - Generated 58 tokens in 1.86 seconds, 31.24 tokens/sec\n2025-03-21 07:14:00.301 | INFO     | tools.llama.generate:generate_long:864 - Bandwidth achieved: 19.93 GB/s\n2025-03-21 07:14:00.301 | INFO     | tools.llama.generate:generate_long:869 - GPU Memory used: 2.03 GB\n/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/nn/modules/conv.py:306: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)\nreturn F.conv1d(input, weight, bias, self.stride,\nNext sample",
  "metrics": {
    "predict_time": 2.742632232,
    "total_time": 117.838884
  },
  "output": "https://replicate.delivery/xezq/A3oXUsmefIjU8E5b7pUhXzirnbSQBKN3fhj2YzYMSxcxdY1oA/generated.wav",
  "started_at": "2025-03-21T07:13:57.933251Z",
  "status": "succeeded",
  "urls": {
    "stream": "https://stream.replicate.com/v1/files/bcwr-3yw4fygktiy33njadjegpnrndayxos2yz7fm6snd5s46k7d5x4qa",
    "get": "https://api.replicate.com/v1/predictions/nh5y2bvd2nrme0cnpy5s8yd5c4",
    "cancel": "https://api.replicate.com/v1/predictions/nh5y2bvd2nrme0cnpy5s8yd5c4/cancel"
  },
  "version": "11f3e0394c06dcc099c0cbaf75f4a6e7da84cb4aaa5d53bedfc3234b5c8aaefc"
}

Generated in

2.8 seconds

Tweak it Share Report

2025-03-21 07:13:58.443 | INFO     | tools.llama.generate:generate_long:789 - Encoded text: 我的猫，就是全世界最好的猫！
2025-03-21 07:13:58.443 | INFO     | tools.llama.generate:generate_long:807 - Generating sentence 1/1 of sample 1/1
  0%|          | 0/8070 [00:00<?, ?it/s]/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/backends/cuda/__init__.py:342: FutureWarning: torch.backends.cuda.sdp_kernel() is deprecated. In the future, this context manager will be removed. Please see, torch.nn.attention.sdpa_kernel() for the new context manager, with updated signature.
warnings.warn(
  0%|          | 4/8070 [00:00<03:49, 35.18it/s]
  0%|          | 8/8070 [00:00<03:47, 35.46it/s]
  0%|          | 12/8070 [00:00<03:46, 35.55it/s]
  0%|          | 16/8070 [00:00<03:46, 35.59it/s]
  0%|          | 20/8070 [00:00<03:46, 35.52it/s]
  0%|          | 24/8070 [00:00<03:47, 35.30it/s]
  0%|          | 28/8070 [00:00<03:48, 35.14it/s]
  0%|          | 32/8070 [00:00<03:48, 35.21it/s]
  0%|          | 36/8070 [00:01<03:47, 35.30it/s]
  0%|          | 40/8070 [00:01<03:46, 35.42it/s]
  1%|          | 44/8070 [00:01<03:45, 35.52it/s]
  1%|          | 48/8070 [00:01<03:45, 35.59it/s]
  1%|          | 52/8070 [00:01<03:44, 35.64it/s]
  1%|          | 56/8070 [00:01<03:44, 35.67it/s]
1%|          | 56/8070 [00:01<03:49, 34.85it/s]
2025-03-21 07:14:00.300 | INFO     | tools.llama.generate:generate_long:861 - Generated 58 tokens in 1.86 seconds, 31.24 tokens/sec
2025-03-21 07:14:00.301 | INFO     | tools.llama.generate:generate_long:864 - Bandwidth achieved: 19.93 GB/s
2025-03-21 07:14:00.301 | INFO     | tools.llama.generate:generate_long:869 - GPU Memory used: 2.03 GB
/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/torch/nn/modules/conv.py:306: UserWarning: Plan failed with a cudnnException: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_NOT_SUPPORTED (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:919.)
return F.conv1d(input, weight, bias, self.stride,
Next sample