You're looking at a specific version of this model. Jump to the model overview.
geopti /sam-audio-base:b84861ae
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
| Field | Type | Default value | Description |
|---|---|---|---|
| audio |
string
|
Input audio or video file (WAV, MP3, MP4, etc.)
|
|
| description |
string
|
speech
|
Text description of the sound to isolate. Use simple noun phrases like 'speech', 'man speaking', 'dog barking', 'piano', 'guitar playing', 'birds chirping'
|
| use_span_prompting |
boolean
|
False
|
Enable span prompting to specify time ranges where the target sound occurs. More precise but requires knowing timestamps.
|
| span_anchors |
string
|
[]
|
[Only if use_span_prompting=True] Time ranges as JSON array. Format: [['+', start_sec, end_sec], ...]. '+' means sound present, '-' means absent. Example: [['+', 2.0, 4.0]] or [['+', 1.0, 3.0], ['-', 5.0, 6.0]]
|
| predict_spans |
boolean
|
False
|
Auto-detect time spans where target sound occurs. Improves quality for non-ambient sounds but slower.
|
| output_residual |
boolean
|
False
|
Also output the residual audio (everything except the target sound)
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'items': {'format': 'uri', 'type': 'string'},
'title': 'Output',
'type': 'array'}