You're looking at a specific version of this model. Jump to the model overview.

geopti /sam-audio-large:2e136416

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
audio
string
Input audio or video file (WAV, MP3, MP4, etc.)
description
string
speech
Text description of the sound to isolate. Use simple noun phrases like 'speech', 'man speaking', 'dog barking', 'piano', 'guitar playing', 'birds chirping'
use_span_prompting
boolean
False
Enable span prompting to specify time ranges where the target sound occurs. More precise but requires knowing timestamps.
span_anchors
string
[]
[Only if use_span_prompting=True] Time ranges as JSON array. Format: [['+', start_sec, end_sec], ...]. '+' means sound present, '-' means absent. Example: [['+', 2.0, 4.0]] or [['+', 1.0, 3.0], ['-', 5.0, 6.0]]
predict_spans
boolean
False
Auto-detect time spans where target sound occurs. Improves quality for non-ambient sounds but slower.
output_residual
boolean
False
Also output the residual audio (everything except the target sound)

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'items': {'format': 'uri', 'type': 'string'},
 'title': 'Output',
 'type': 'array'}