You're looking at a specific version of this model. Jump to the model overview.
fishaudio /ace-step-1.5:74e3a7d3
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
| Field | Type | Default value | Description |
|---|---|---|---|
| prompt |
string
|
upbeat electronic dance music with heavy bass and synth leads
|
Short text describing the desired music — genre, mood, instruments, style. Max 512 characters.
|
| lyrics |
string
|
[Instrumental]
|
Lyrics for the song. Use '[Instrumental]' for instrumental tracks. Max 4096 characters.
|
| duration |
number
|
30
Min: -1 Max: 600 |
Target audio length in seconds. Set to -1 for auto.
|
| bpm |
integer
|
Min: 30 Max: 300 |
Beats per minute (30-300). Leave unset for auto-detection by the LM.
|
| key_scale |
string
|
|
Musical key and scale (e.g. 'C major', 'F# minor', 'Bb major'). Leave empty for auto.
|
| time_signature |
None
|
auto
|
Time signature: 2 for 2/4, 3 for 3/4, 4 for 4/4, 6 for 6/8. Use 'auto' for auto-detection.
|
| inference_steps |
integer
|
8
Min: 1 Max: 200 |
Number of diffusion steps. Turbo model: 4-8 recommended. Base/SFT: 32-100.
|
| guidance_scale |
number
|
7
Min: 1 Max: 15 |
CFG strength. Only used by base/SFT models — ignored by turbo. Higher = follows prompt more strictly.
|
| shift |
number
|
3
Min: 1 Max: 5 |
Timestep shift factor. Default 1.0, use 3.0 for turbo model.
|
| seed |
integer
|
-1
|
Random seed for reproducibility. -1 for random.
|
| thinking |
boolean
|
True
|
Enable LM chain-of-thought reasoning for metadata, caption, and language detection.
|
| batch_size |
integer
|
1
Min: 1 Max: 4 |
Number of songs to generate in parallel.
|
| audio_format |
None
|
mp3
|
Output audio format.
|
Output schema
The shape of the response you’ll get when you run this model with an API.
{'items': {'format': 'uri', 'type': 'string'},
'title': 'Output',
'type': 'array'}