bzikst/s2-pro | API reference

Fish Audio S2 Pro is a leading text-to-speech (TTS) model with fine-grained inline control of prosody and emotion. Trained on over 10M+ hours of audio data across 80+ languages

Public

36 runs

Weights

Playground API Examples README Versions

Run bzikst/s2-pro with an API

Use one of our client libraries to get started quickly. Clicking on a library will take you to the Playground tab where you can tweak different inputs, see the results, and copy the corresponding code to use in your own project.

Input schema

The fields you can use to run this model with an API. If you don't give a value for a field its default value will be used.

Field	Type	Default value	Description
text	string		Text to synthesize
reference_audio	string		Reference audio for voice cloning
reference_text	string		Transcript of the reference audio
chunk_length	integer	200 Min: 100 Max: 300	Chunk length for iterative prompting
max_new_tokens	integer	1024 Max: 4096	Maximum new tokens, 0 means no limit
top_p	number	0.8 Min: 0.1 Max: 1	Top-p
repetition_penalty	number	1.1 Min: 0.9 Max: 2	Repetition penalty
temperature	number	0.8 Min: 0.1 Max: 1	Sampling temperature
seed	integer		Deterministic seed, omit for random generation

{
  "type": "object",
  "title": "Input",
  "required": [
    "text"
  ],
  "properties": {
    "seed": {
      "type": "integer",
      "title": "Seed",
      "x-order": 8,
      "nullable": true,
      "description": "Deterministic seed, omit for random generation"
    },
    "text": {
      "type": "string",
      "title": "Text",
      "x-order": 0,
      "description": "Text to synthesize"
    },
    "top_p": {
      "type": "number",
      "title": "Top P",
      "default": 0.8,
      "maximum": 1,
      "minimum": 0.1,
      "x-order": 5,
      "description": "Top-p"
    },
    "temperature": {
      "type": "number",
      "title": "Temperature",
      "default": 0.8,
      "maximum": 1,
      "minimum": 0.1,
      "x-order": 7,
      "description": "Sampling temperature"
    },
    "chunk_length": {
      "type": "integer",
      "title": "Chunk Length",
      "default": 200,
      "maximum": 300,
      "minimum": 100,
      "x-order": 3,
      "description": "Chunk length for iterative prompting"
    },
    "max_new_tokens": {
      "type": "integer",
      "title": "Max New Tokens",
      "default": 1024,
      "maximum": 4096,
      "minimum": 0,
      "x-order": 4,
      "description": "Maximum new tokens, 0 means no limit"
    },
    "reference_text": {
      "type": "string",
      "title": "Reference Text",
      "default": "",
      "x-order": 2,
      "description": "Transcript of the reference audio"
    },
    "reference_audio": {
      "type": "string",
      "title": "Reference Audio",
      "format": "uri",
      "x-order": 1,
      "nullable": true,
      "description": "Reference audio for voice cloning"
    },
    "repetition_penalty": {
      "type": "number",
      "title": "Repetition Penalty",
      "default": 1.1,
      "maximum": 2,
      "minimum": 0.9,
      "x-order": 6,
      "description": "Repetition penalty"
    }
  }
}

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema

{
  "type": "string",
  "title": "Output",
  "format": "uri"
}