You're looking at a specific version of this model. Jump to the model overview.

pku-yuangroup /llava-cot:40c17578

Input

image
string
Shift + Return to add a new line

Text prompt

Default: "If I had to write a haiku for this one, it would be: "

*file

Grayscale input image

integer

Max number of generated tokens

Default: 1024

number
(minimum: 0, maximum: 5)

Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value.

Default: 0.9

number
(minimum: 0, maximum: 1)

When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens, used when temperature > 0

Default: 0.95

Output

<SUMMARY> I will identify and count all the objects in the image, then subtract the tiny shiny balls and red objects to determine how many remain. </SUMMARY> <CAPTION> The image contains various 3D shapes, including spheres, cubes, and cylinders. The objects are of different colors: red, gold, blue, green, and teal. There are multiple shiny spheres and a red cylinder. </CAPTION> <REASONING> First, I will count all the objects in the image. There are several spheres, cubes, and cylinders. Next, I will identify and subtract the tiny shiny balls. There are two shiny gold spheres that fit this description. Then, I will subtract all red objects. There is one red sphere and one red cylinder. After subtracting these objects, I will count the remaining objects. </REASONING> <CONCLUSION> 5 </CONCLUSION>
Generated in