You're looking at a specific version of this model. Jump to the model overview.

jimothyjohn /stable-audio-3-medium:8a54f4dd

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
prompt
string
Text description of the audio to generate.
negative_prompt
string
Qualities to avoid in the output. Only affects -base models.
duration
number
30

Min: 1

Max: 380

Length of the generated audio in seconds.
steps
integer
8

Min: 4

Max: 50

Number of diffusion sampling steps. 8 is the post-trained default; lower is faster, higher rarely helps.
cfg_scale
number
1

Min: 0.5

Max: 15

Classifier-free guidance scale. Only affects -base models.
seed
integer
-1
Random seed. -1 selects a new random seed for each run.
init_audio
string
Optional source audio for audio-to-audio editing.
init_noise_level
number
0.9

Max: 1

How much the init audio influences the output. 1.0 = pure generation, lower keeps more of the original.
inpaint_audio
string
Optional source audio for inpainting or continuation. Set inpaint_start_seconds and inpaint_end_seconds to mark the region to regenerate; set start to the file length to extend it.
inpaint_start_seconds
number
0
Start of the inpaint region in seconds (used with inpaint_audio).
inpaint_end_seconds
number
0
End of the inpaint region in seconds (used with inpaint_audio).
output_format
None
wav
Output file format.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'format': 'uri', 'title': 'Output', 'type': 'string'}