You're looking at a specific version of this model. Jump to the model overview.

cjwbw /unival:00a9af2b

Input

file

Input image.

file

Input audio.

file

Input video.

string

Choose a task.

Default: "Image Captioning"

string
Shift + Return to add a new line

Provide question for the VQA task, region for Visual Grounding task, and instruction for General tasks. The default instruction for Captioning task is ‘What does the image/video/audio describe?’

Output

No output yet! Press "Submit" to start a prediction.