Readme
Readme
Unofficial BLIP-3 (xgen-mm-phi3-mini-instruct-r-v1) demo and API
Note that this is an unofficial implementation of BLIP-3 (previously known as blip3-phi3-mini-base-r-v1) that is not associated with Salesforce.
Usage
BLIP-3 is a model that answers questions about images. To use it, provide an image, and then ask a question about that image. For example, you can provide the following image:
and then pose the following question:
What is this a picture of?
and get the output:
Marina Bay Sands, Singapore.
BLIP-3 is also capable of captioning images. This works by sending the model a blank prompt, though we have an explicit toggle for image captioning in the UI & API.
You can also provide BLIP-3 with more context when asking a question. For example, given the following image:
you can provide the output of a previous Q&A as context in question: … answer: … format like so:
question: What animal is this? answer: A panda
and then pose an additional question:
What country is this animal native to?
and get the output:
China
Model description
XGen-MM (previously known as BLIP-3) is a series of the latest foundational Large Multimodal Models (LMMs) developed by Salesforce AI Research. This series advances upon the successful designs of the BLIP series, incorporating fundamental enhancements that ensure a more robust and superior foundation.
Key features of XGen-MM: - The pretrained foundation model, xgen-mm-phi3-mini-base-r-v1, achieves state-of-the-art performance under 5b parameters and demonstrates strong in-context learning capabilities. - The instruct fine-tuned model, xgen-mm-phi3-mini-instruct-r-v1, achieves state-of-the-art performance among open-source and closed-source VLMs under 5b parameters. - xgen-mm-phi3-mini-instruct-r-v1 supports flexible high-resolution image encoding with efficient visual token sampling.
These models have been trained at scale on high-quality image caption datasets and interleaved image-text data.
Citation
@misc{xgen_mm_phi3_mini, title={xgen-mm-phi3-mini-instruct Model Card}, url={https://huggingface.co/Salesforce/xgen-mm-phi3-mini-instruct-r-v1}, author={Salesforce AI Research}, month={May}, year={2024} }