You're looking at a specific version of this model. Jump to the model overview.

lucataco /interactiveomni-8b:6d19412f

Input schema

The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.

Field Type Default value Description
seed
integer
0
Random seed for reproducible sampling. Set to -1 to disable seeding.
audio
string
Optional audio clip (WAV/MP3/FLAC). Resampled to 24 kHz automatically.
top_p
number
0.8

Min: 0.01

Max: 1

Top-p nucleus sampling parameter.
video
string
Optional video clip (MP4/MOV/WebM) for video-grounded conversation.
images
array
Optional list of images (PNG/JPG/WebP) to provide visual context.
prompt
string
User text prompt. Leave blank when providing only media inputs.
max_tiles
integer
12

Min: 1

Max: 48

Maximum number of temporal tiles to sample when a video is provided.
temperature
number
0.7

Max: 2

Sampling temperature. Set to 0 for greedy decoding.
system_prompt
string
Optional system prompt. When audio output is enabled and this is left blank, a recommended prompt is injected automatically.
max_new_tokens
integer
512

Min: 32

Max: 2048

Maximum number of tokens to generate.
frames_per_tile
integer
8

Min: 1

Max: 32

Number of frames sampled per tile when processing video.
enable_audio_output
boolean
False
Return generated speech along with text output.

Output schema

The shape of the response you’ll get when you run this model with an API.

Schema
{'properties': {'audio': {'format': 'uri',
                          'nullable': True,
                          'title': 'Audio',
                          'type': 'string'},
                'text': {'title': 'Text', 'type': 'string'}},
 'required': ['text'],
 'title': 'OmniOutput',
 'type': 'object'}