Collections

Chat with images

Vision models process and interpret visual information from images and videos. You can use vision models to answer questions about the content of an image, identify and locate objects, etc.

Here's an example using the yorickvp/llava-13b vision model to generate recipe ideas from an image of your fridge:

<a href="https://replicate.com/p/c4jewm3bmyqz4og3y2itvrvc5u"> <img alt="fridge" src="https://github.com/replicate/cog/assets/2289/55bd8de4-43cf-4a16-ad87-d2bb2ea5e42f"> </a>

And here’s how you can run the model from your JavaScript code:

import Replicate from "replicate";
const replicate = new Replicate();

const output = await replicate.run(
  "yorickvp/llava-13b:01359160a4cff57c6b7d4dc625d0019d390c7c46f553714069f114b392f4a726",
  {
    input: {
      image: "https://replicate.delivery/pbxt/KZOUXoMy3OxnyOeIA0LNzhtWDjBZLm9T6IPm5lbKcFT8lybo/fridge.png",
      prompt: "Here's a photo of my fridge today. Please give me some simple recipe ideas based on its contents.",
    }
  }
);
console.log(output);

If you don't need reasoning abilities and just want to get descriptions of images, check out our image captioning collection →