bzikst/s2-pro
Fish Audio S2 Pro is a leading text-to-speech (TTS) model with fine-grained inline control of prosody and emotion. Trained on over 10M+ hours of audio data across 80+ languages
Run bzikst/s2-pro with an API
Use one of our client libraries to get started quickly. Clicking on a library will take you to the Playground tab where you can tweak different inputs, see the results, and copy the corresponding code to use in your own project.
Input schema
The fields you can use to run this model with an API. If you don't give a value for a field its default value will be used.
| Field | Type | Default value | Description |
|---|---|---|---|
| text |
string
|
Text to synthesize
|
|
| reference_audio |
string
|
Reference audio for voice cloning
|
|
| reference_text |
string
|
|
Transcript of the reference audio
|
| chunk_length |
integer
|
200
Min: 100 Max: 300 |
Chunk length for iterative prompting
|
| max_new_tokens |
integer
|
1024
Max: 4096 |
Maximum new tokens, 0 means no limit
|
| top_p |
number
|
0.8
Min: 0.1 Max: 1 |
Top-p
|
| repetition_penalty |
number
|
1.1
Min: 0.9 Max: 2 |
Repetition penalty
|
| temperature |
number
|
0.8
Min: 0.1 Max: 1 |
Sampling temperature
|
| seed |
integer
|
Deterministic seed, omit for random generation
|
{
"type": "object",
"title": "Input",
"required": [
"text"
],
"properties": {
"seed": {
"type": "integer",
"title": "Seed",
"x-order": 8,
"nullable": true,
"description": "Deterministic seed, omit for random generation"
},
"text": {
"type": "string",
"title": "Text",
"x-order": 0,
"description": "Text to synthesize"
},
"top_p": {
"type": "number",
"title": "Top P",
"default": 0.8,
"maximum": 1,
"minimum": 0.1,
"x-order": 5,
"description": "Top-p"
},
"temperature": {
"type": "number",
"title": "Temperature",
"default": 0.8,
"maximum": 1,
"minimum": 0.1,
"x-order": 7,
"description": "Sampling temperature"
},
"chunk_length": {
"type": "integer",
"title": "Chunk Length",
"default": 200,
"maximum": 300,
"minimum": 100,
"x-order": 3,
"description": "Chunk length for iterative prompting"
},
"max_new_tokens": {
"type": "integer",
"title": "Max New Tokens",
"default": 1024,
"maximum": 4096,
"minimum": 0,
"x-order": 4,
"description": "Maximum new tokens, 0 means no limit"
},
"reference_text": {
"type": "string",
"title": "Reference Text",
"default": "",
"x-order": 2,
"description": "Transcript of the reference audio"
},
"reference_audio": {
"type": "string",
"title": "Reference Audio",
"format": "uri",
"x-order": 1,
"nullable": true,
"description": "Reference audio for voice cloning"
},
"repetition_penalty": {
"type": "number",
"title": "Repetition Penalty",
"default": 1.1,
"maximum": 2,
"minimum": 0.9,
"x-order": 6,
"description": "Repetition penalty"
}
}
}
Output schema
The shape of the response you’ll get when you run this model with an API.
{
"type": "string",
"title": "Output",
"format": "uri"
}