salesforce / blip

Generate image captions

  • Public
  • 161.9M runs
  • T4
  • GitHub
  • Paper
  • License

Input

image
*file

Input image

string

Choose a task.

Default: "image_captioning"

string
Shift + Return to add a new line

Type question for the input image for visual question answering task.

string
Shift + Return to add a new line

Type caption for the input image for image text matching task.

Output

Caption: a woman sitting on the beach with a dog
Generated in

This example was created by a different version, salesforce/blip:5a977fcb.

Run time and cost

This model costs approximately $0.00022 to run on Replicate, or 4545 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on Nvidia T4 GPU hardware. Predictions typically complete within 1 seconds.

Readme

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

This is the PyTorch code of the BLIP paper.

Citation

If you find this code to be useful for your research, please consider citing.

@misc{li2022blip,
      title={BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation}, 
      author={Junnan Li and Dongxu Li and Caiming Xiong and Steven Hoi},
      year={2022},
      eprint={2201.12086},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

The implementation of BLIP relies on resources from ALBEF, Huggingface Transformers, and timm. We thank the original authors for their open-sourcing.