You're looking at a specific version of this model. Jump to the model overview.
Input schema
The fields you can use to run this model with an API. If you don’t give a value for a field its default value will be used.
Field | Type | Default value | Description |
---|---|---|---|
input_image |
string
|
Input image.
|
|
input_audio |
string
|
Input audio.
|
|
input_video |
string
|
Input video.
|
|
task_type |
string
(enum)
|
Image Captioning
Options: Image Captioning, Video Captioning, Audio Captioning, Visual Grounding, General, General Video |
Choose a task.
|
instruction |
string
|
Provide question for the VQA task, region for Visual Grounding task, and instruction for General tasks. The default instruction for Captioning task is ‘What does the image/video/audio describe?’
|
Output schema
The shape of the response you’ll get when you run this model with an API.
Schema
{'properties': {'answer': {'title': 'Answer', 'type': 'string'},
'output': {'format': 'uri',
'title': 'Output',
'type': 'string'}},
'type': 'object'}