pku-yuangroup / llava-cot

Let Vision Language Models Reason Step-by-Step

  • Public
  • 35 runs
  • L40S
  • GitHub
  • Weights
  • Paper
  • License

Input

image
string
Shift + Return to add a new line

Text prompt

Default: "If I had to write a haiku for this one, it would be: "

*file

Grayscale input image

integer

Max number of generated tokens

Default: 1024

number
(minimum: 0, maximum: 5)

Adjusts randomness of outputs, greater than 1 is random and 0 is deterministic, 0.75 is a good starting value.

Default: 0.9

number
(minimum: 0, maximum: 1)

When decoding text, samples from the top p percentage of most likely tokens; lower to ignore less likely tokens, used when temperature > 0

Default: 0.95

Output

<SUMMARY> To solve the problem, I will analyze the image to identify the key characteristics of the pastry, then describe how to make it by following standard steps for preparation. </SUMMARY> <CAPTION> The image shows a tray of baklava, a pastry made with layers of phyllo dough, filled with chopped nuts, and topped with a sweet syrup. The pastry is golden brown, indicating it's been baked. </CAPTION> <REASONING> To make baklava, start by preparing the nuts by roasting them. Then, assemble the layers by placing phyllo dough sheets in a tray, brushing them with butter. Spread the nut mixture evenly over the phyllo layers. Repeat this process until the layers are complete. Cut the baklava into diamond shapes and bake until golden. Finally, drizzle with syrup. </REASONING> <CONCLUSION> To make this delicious baklava, follow these steps: 1. Begin by preparing the nuts. You can use either walnuts, pistachios, or a combination of both. Roast the nuts in a 350°F oven for about 10 minutes until they are fragrant and slightly toasted. Let them cool down. 2. In a large bowl, combine the cooled nuts with the sugar and cinnamon. Mix well until the nuts are evenly coated with the sugar and cinnamon mixture. 3. To assemble the baklava, preheat your oven to 350°F. Take a rectangular tray or baking dish and line the bottom with a sheet of phyllo dough. Brush the phyllo with melted butter, then repeat this process until you have about 8-10 layers of phyllo dough. 4. Once you've assembled the phyllo dough layers, spread the nut mixture evenly over the top layer of phyllo. 5. Carefully roll the baklava tightly from one end of the tray to the other, applying gentle pressure to ensure it's compact. Use a knife or sharp object to cut the baklava into diamond shapes. 6. Place the baklava in the preheated oven and bake for about 45 minutes, or until the phyllo is golden brown and crisp. 7. Remove the baklava from the oven and let it cool for a few minutes. While it's still warm, drizzle the honey syrup evenly over the top of the baklava. 8. Let the baklava cool completely before serving. This will allow the flavors to meld together and the pastry to absorb the honey syrup. Enjoy your delicious homemade baklava! </CONCLUSION>
Generated in

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

🔥 Highlights

LLaVA-CoT is the first visual language model capable of spontaneous, systematic reasoning, similar to GPT-o1!

Our 11B model outperforms Gemini-1.5-pro,GPT-4o-mini, and Llama-3.2-90B-Vision-Instruct on six challenging multimodal benchmarks.

📝 Citation

If you find this paper useful, please consider staring 🌟 this repo and citing 📑 our paper:

@misc{xu2024llavacot,
      title={LLaVA-CoT: Let Vision Language Models Reason Step-by-Step}, 
      author={Guowei Xu and Peng Jin and Hao Li and Yibing Song and Lichao Sun and Li Yuan},
      year={2024},
      eprint={2411.10440},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2411.10440}, 
}

🙏 Acknowledgement

  • The majority of this project is released under the Apache 2.0 license as found in the LICENSE file.
  • The service is a research preview intended for non-commercial use only, subject to LLAMA 3.2 COMMUNITY LICENSE AGREEMENT, and Terms of Use of the data generated by OpenAI. Please contact us if you find any potential violations.
  • The template is modified from Chat-Univi and LLaVA.